# https://github.com/apache/hamilton Project Manual

Generated at: 2026-06-24 21:46:57 UTC

## Table of Contents

- [Core Architecture: Driver, Nodes & DAGs](#page-1)
- [Function Modifiers, Lifecycle Hooks & Plugins](#page-2)
- [Graph Adapters, Execution & Caching](#page-3)
- [Hamilton UI, SDK & Data Tracking](#page-4)

<a id='page-1'></a>

## Core Architecture: Driver, Nodes & DAGs

### Related Pages

Related topics: [Function Modifiers, Lifecycle Hooks & Plugins](#page-2), [Graph Adapters, Execution & Caching](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [hamilton/driver.py](https://github.com/apache/hamilton/blob/main/hamilton/driver.py)
- [hamilton/node.py](https://github.com/apache/hamilton/blob/main/hamilton/node.py)
- [hamilton/graph.py](https://github.com/apache/hamilton/blob/main/hamilton/graph.py)
- [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py)
- [hamilton/htypes.py](https://github.com/apache/hamilton/blob/main/hamilton/htypes.py)
- [hamilton/graph_types.py](https://github.com/apache/hamilton/blob/main/hamilton/graph_types.py)
- [hamilton/function_modifiers.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers.py)
</details>

# Core Architecture: Driver, Nodes & DAGs

Apache Hamilton is a declarative, function-centric framework that turns a set of regular Python functions into a Directed Acyclic Graph (DAG) of data transformations. The runtime primitives are **nodes** (individual transformation steps), **the graph** (the DAG they compose into), and **the Driver** (the orchestrator that materializes requested outputs). This page covers the core architecture that powers every other feature in the project — including the UI, lifecycle adapters, and contrib integrations.

## High-Level Architecture

At a glance, Hamilton turns authored Python modules into a queryable, executable DAG. The `Driver` is the only public object most users need; it holds configuration, an adapter, and a graph that can be introspected or executed.

```mermaid
flowchart LR
    A[Python modules<br/>functions w/ type hints] --> B[Node.from_function<br/>hamilton/node.py]
    B --> C[Graph / DAG<br/>hamilton/graph.py]
    C --> D[Driver<br/>hamilton/driver.py]
    D --> E[Adapter<br/>hamilton/base.py]
    D --> F[Executor]
    D --> G[Introspection API<br/>visualize, what_is_upstream_of, ...]
    D --> H[Results / Outputs]
```

Source: [hamilton/driver.py](https://github.com/apache/hamilton/blob/main/hamilton/driver.py), [hamilton/node.py](https://github.com/apache/hamilton/blob/main/hamilton/node.py), [hamilton/graph.py](https://github.com/apache/hamilton/blob/main/hamilton/graph.py), [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py)

## The Node Primitive

A `Node` is the atomic unit of a Hamilton DAG. Each Python function with annotated parameters and a return type is wrapped into a `Node` via `Node.from_function` (or a similar constructor). The node captures:

- A **name** (the output variable the function produces, or an explicit name when overridden by `@parameterize` / `@config`).
- A **type** (extracted from the return annotation, or declared via `htypes.column` / `htypes.custom_subclass`).
- **Input dependencies** (extracted from the parameter list — the names of upstream nodes it consumes).
- A reference to the **callable** itself.
- **Tags** and **metadata** (added by `@tag`, `@tag_outputs`, or `@check_output`).

The node construction logic lives in [hamilton/node.py](https://github.com/apache/hamilton/blob/main/hamilton/node.py). Function-modifying decorators live in [hamilton/function_modifiers.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers.py) — for example, `@parameterize` splits a single function into many nodes, and `@config.when` swaps in alternative implementations based on runtime configuration.

Source: [hamilton/node.py](https://github.com/apache/hamilton/blob/main/hamilton/node.py), [hamilton/function_modifiers.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers.py), [hamilton/htypes.py](https://github.com/apache/hamilton/blob/main/hamilton/htypes.py)

## The Graph (DAG)

Once modules are passed to the `Driver`, the framework walks each function, builds `Node` objects, and assembles them into a `Graph` defined in [hamilton/graph.py](https://github.com/apache/hamilton/blob/main/hamilton/graph.py). The graph enforces:

- **Acyclicity** — Hamilton validates that the dependency edges form a DAG; cycles raise an error during construction.
- **Reachability** — nodes not required to produce any requested output are pruned.
- **Type-aware compilation** — the graph resolves type compatibility and feeds it to the adapter.

The graph also exposes the introspection API the README advertises: `visualize_execution`, `what_is_upstream_of`, `what_is_downstream_of`, `visualize_path_between`, and similar queries. These are documented in the lineage example ([examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md)) and implemented against the data structures in [hamilton/graph_types.py](https://github.com/apache/hamilton/blob/main/hamilton/graph_types.py).

Source: [hamilton/graph.py](https://github.com/apache/hamilton/blob/main/hamilton/graph.py), [hamilton/graph_types.py](https://github.com/apache/hamilton/blob/main/hamilton/graph_types.py), [examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md)

## The Driver

The `Driver` class, defined in [hamilton/driver.py](https://github.com/apache/hamilton/blob/main/hamilton/driver.py), is the public entry point. It is constructed with:

- A **config** dict (resolved per-node through `@config` decorators).
- One or more **modules** containing node functions.
- An **adapter** (from [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py)) that bridges Hamilton to an execution backend — pandas, polars, Spark, Ibis, etc.

The Driver's responsibilities are:

1. **Compile** the modules into a `Graph` and validate it.
2. **Resolve** requested outputs into a sub-DAG, including upstream prerequisites and overrides.
3. **Execute** nodes in topological order, feeding results to downstream consumers through the adapter.
4. **Report** progress, errors, and results to lifecycle hooks (relevant to community discussion in issue #1196, which requests finer-grained hooks for task-based parallel DAGs).

Source: [hamilton/driver.py](https://github.com/apache/hamilton/blob/main/hamilton/driver.py), [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py)

## Adapters and the Type System

Adapters in [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py) are the swap-in layer between the framework-agnostic DAG and a concrete dataframe library. The default `DefaultAdapter` works with plain Python objects, while contrib packages provide `pandas`, `polars`, `pyspark`, `ibis`, and others. The type system uses standard Python type hints plus helpers in [hamilton/htypes.py](https://github.com/apache/hamilton/blob/main/hamilton/htypes.py) (e.g. `column(...)` for column-typed dataframes).

Validation is layered on top of nodes via `@check_output` and the `SchemaValidator` adapter, which the README highlights as a built-in feature for tracking and validating dataframe schemas at runtime.

Source: [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py), [hamilton/htypes.py](https://github.com/apache/hamilton/blob/main/hamilton/htypes.py), [README.md](https://github.com/apache/hamilton/blob/main/README.md)

## Common Failure Modes

Based on community discussions, the most common entry-point issues occur in layers adjacent to (not inside) the core architecture:

- **UI startup failures** (issue #1457) come from `django-ninja` configuration mismatches in the tracking server, not the core Driver. Updating UI dependencies usually resolves them.
- **Insufficient lifecycle hooks for parallel tasks** (issue #1196) — `TaskExecutionHook` is invoked per task, and users building custom progress bars for dynamic DAGs need additional hooks that have not yet been exposed.

The core `Driver` / `Graph` / `Node` trio, however, is generally stable: the most common authoring errors are unannotated parameters (which become unknown dependencies) and accidental cycles (caught at compile time).

## See Also

- [Lineage & Introspection](https://hamilton.apache.org/concepts/lineage/) — examples in `examples/lineage/`
- [UI Tracking Server Models](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_template/models.py) — the data model that consumes the Driver's execution metadata
- [Language Server](https://github.com/apache/hamilton/tree/main/dev_tools/language_server) — editor integration built on top of the same node model

---

<a id='page-2'></a>

## Function Modifiers, Lifecycle Hooks & Plugins

### Related Pages

Related topics: [Core Architecture: Driver, Nodes & DAGs](#page-1), [Graph Adapters, Execution & Caching](#page-3), [Hamilton UI, SDK & Data Tracking](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [hamilton/function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py)
- [hamilton/function_modifiers/configuration.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/configuration.py)
- [hamilton/function_modifiers/expanders.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/expanders.py)
- [hamilton/lifecycle/base.py](https://github.com/apache/hamilton/blob/main/hamilton/lifecycle/base.py)
- [hamilton/lifecycle/api.py](https://github.com/apache/hamilton/blob/main/hamilton/lifecycle/api.py)
- [hamilton/plugins/__init__.py](https://github.com/apache/hamilton/blob/main/hamilton/plugins/__init__.py)
</details>

# Function Modifiers, Lifecycle Hooks & Plugins

## Purpose and Scope

Apache Hamilton provides three complementary extension mechanisms that allow users to customize DAG construction, observe execution, and integrate with external systems without modifying the core library:

1. **Function Modifiers** — Python decorators (`@config.when()`, `@tag`, `@tag_outputs`, `@check_output`, etc.) that alter how a function participates in the DAG at *build time*.
2. **Lifecycle Hooks** — Callback objects registered with the `Driver` that fire at well-defined points during *execution time*.
3. **Plugins** — Adapter and integration packages (under `hamilton.plugins`) that plug Hamilton into dataframes (pandas, polars, Ibis, Spark, etc.), telemetry backends, and observability stacks.

Together they form the customization surface that the README highlights with the statement "Built for plugins. Apache Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack." ([README.md](https://github.com/apache/hamilton/blob/main/README.md))

## Function Modifiers

Function modifiers live under `hamilton/function_modifiers/` and are implemented as decorator factories that wrap a user function and return one or more *resolved* functions suitable for inclusion in the DAG. Source: [hamilton/function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py)

The base abstractions include:

| Modifier | Role | Source |
|---|---|---|
| `FunctionTransform` | Root class for any decorator that rewrites/expands a node's functions | [function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py) |
| `@config.when(...)` / `@config.ifnot(...)` | Conditionally include a node based on driver configuration keys | [function_modifiers/configuration.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/configuration.py) |
| `@expand`, `@parameterize`, `@parameterize_values` | Fan a single function into multiple parameterized nodes | [function_modifiers/expanders.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/expanders.py) |
| `@tag`, `@tag_outputs` | Attach arbitrary key/value metadata to nodes for filtering/lineage | [function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py) |
| `@check_output` | Validate returned values against a schema at runtime | [function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py) |

Because modifiers are applied at DAG-build time, they compose: a `@parameterize(...)`-expanded set of nodes can themselves be wrapped in `@tag(owner="team-x")`, and the resulting tags flow into lineage queries such as `dr.what_is_upstream_of(...)` demonstrated in [examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md). The `SchemaValidator()` adapter (mentioned in the README) is itself a function-modifier–driven integration that introspects dataframe schemas.

## Lifecycle Hooks

Where function modifiers affect *what* the DAG contains, lifecycle hooks affect *when observers are notified*. Source: [hamilton/lifecycle/base.py](https://github.com/apache/hamilton/blob/main/hamilton/lifecycle/base.py)

The `hamilton.lifecycle.api` module (referenced in [hamilton/lifecycle/api.py](https://github.com/apache/hamilton/blob/main/hamilton/lifecycle/api.py)) exposes abstract hook classes such as:

- `PreNodeExecuteHook` / `PostNodeExecuteHook` — fire before/after each node runs.
- `TaskExecutionHook` — fires around the lifecycle of an individual task in task-based parallel DAGs.
- `PreGraphExecuteHook` / `PostGraphExecuteHook` — fire once per `Driver.execute(...)` call.
- `NodeExecutionHook` / `GraphExecutionHook` — convenience aggregates that bundle several of the above.

Hooks are passed to the `Driver` via constructor parameters (e.g. `result_builder`, `adapter`, or a dedicated `lifecycle_hooks=` argument depending on the Hamilton version) and receive a typed context object describing the node, its inputs, and (for post-hooks) its output or error.

Community issue [#1196](https://github.com/apache/hamilton/issues/1196) explicitly requests *additional* hook firing points for **dynamic DAGs and parallel execution** — specifically, hooks that fire inside a running task rather than only at the per-node boundary. Until such hooks land, contributors building custom progress-bar or `rich`-based observers must wrap user functions manually rather than relying on a generic `TaskExecutionHook`.

## Plugin System

Hamilton's plugin layer is the broadest of the three: it includes dataframe adapters, result builders, telemetry collectors, and integrations such as `SchemaValidator`. Source: [hamilton/plugins/__init__.py](https://github.com/apache/hamilton/blob/main/hamilton/plugins/__init__.py)

Plugins typically implement one of the abstract base classes defined in `hamilton/base.py` or `hamilton/lifecycle/base.py` and are then exposed via entry points or direct imports. For example, dataframe adapters implement `base.HamiltonGraphAdapter` (e.g. the default `DefaultAdapter`), while result builders implement `base.ResultBuilder` to control how per-node outputs are collected (DataFrame concat, dict, list, etc.).

The plugin layer also underpins the Hamilton UI: the SDK in `ui/sdk/src/hamilton_sdk/` ships an adapter that posts node results and metadata to the tracking server, and the server-side models in [ui/backend/server/trackingserver_base/models.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_base/models.py) persist tagged attributes via the flexible `attributes` table design documented in that file ("plugins … will define the types, which we intend to place in a centralized location").

A current friction point surfaced in community issue [#1457](https://github.com/apache/hamilton/issues/1457) is that the bundled `hamilton ui` command fails to start on a fresh install because the pinned `django-ninja` version removed the legacy `Config` class used by `ModelSchema`. This is a dependency-compatibility failure inside the *UI plugin* and is tracked separately from the core library.

## How the Three Layers Cooperate

```mermaid
flowchart LR
    A[User Function] -->|decorated with| B[Function Modifiers]
    B --> C[Resolved DAG]
    C --> D[Driver.execute]
    D -->|fires| E[Lifecycle Hooks]
    E --> F[Observers / Progress Bars]
    D -->|uses| G[Plugins]
    G --> H[Dataframe Adapters]
    G --> I[Result Builders]
    G --> J[Tracking SDK / UI]
```

A typical pipeline therefore moves left-to-right: modifiers shape the static DAG, the driver walks that DAG during execution while emitting lifecycle events, and plugins consume both the DAG metadata (tags, schemas) and the runtime events (per-node results) to deliver end-to-end behavior.

## Common Failure Modes

- **Decorator ordering matters.** Stacking `@parameterize` *below* `@config.when` yields different node counts than the reverse; the inner decorator runs first.
- **Hook version drift.** As noted in [#1196](https://github.com/apache/hamilton/issues/1196), the available hook surface for parallel / dynamic DAGs is still evolving, so custom observers should guard against missing hook methods.
- **UI dependency breakage.** Issue [#1457](https://github.com/apache/hamilton/issues/1457) shows that the UI plugin can fail to boot due to upstream `django-ninja` changes — pin compatible versions or run the UI via the Docker image until the constraint is updated.
- **Plugin attribute schema.** Because the tracking server stores plugin metadata as free-form `attributes` ([trackingserver_base/models.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_base/models.py)), consumer code must tolerate missing keys.

## See Also

- [hamilton/function_modifiers/](../hamilton/function_modifiers/) — full decorator reference
- [hamilton/lifecycle/](../hamilton/lifecycle/) — hook base classes and adapters
- [hamilton/plugins/](../hamilton/plugins/) — bundled adapters and integrations
- [examples/lineage/README.md](../examples/lineage/README.md) — lineage queries driven by `@tag`
- [examples/LLM_Workflows/pdf_summarizer/README.md](../examples/LLM_Workflows/pdf_summarizer/README.md) — production example using `@config.when()`

---

<a id='page-3'></a>

## Graph Adapters, Execution & Caching

### Related Pages

Related topics: [Core Architecture: Driver, Nodes & DAGs](#page-1), [Function Modifiers, Lifecycle Hooks & Plugins](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [hamilton/execution/graph_functions.py](https://github.com/apache/hamilton/blob/main/hamilton/execution/graph_functions.py)
- [hamilton/execution/executors.py](https://github.com/apache/hamilton/blob/main/hamilton/execution/executors.py)
- [hamilton/execution/grouping.py](https://github.com/apache/hamilton/blob/main/hamilton/execution/grouping.py)
- [hamilton/async_driver.py](https://github.com/apache/hamilton/blob/main/hamilton/async_driver.py)
- [hamilton/caching/adapter.py](https://github.com/apache/hamilton/blob/main/hamilton/caching/adapter.py)
- [hamilton/caching/stores/file.py](https://github.com/apache/hamilton/blob/main/hamilton/caching/stores/file.py)
- [hamilton/driver.py](https://github.com/apache/hamilton/blob/main/hamilton/driver.py)
- [hamilton/base.py](https://github.com/apache/hamilton/blob/main/hamilton/base.py)
- [hamilton/function_modifiers/base.py](https://github.com/apache/hamilton/blob/main/hamilton/function_modifiers/base.py)
- [README.md](https://github.com/apache/hamilton/blob/main/README.md)
- [examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md)
</details>

# Graph Adapters, Execution & Caching

## Overview

Apache Hamilton is a function-centric declarative framework for describing dataflows as directed acyclic graphs (DAGs). The three concerns covered on this page — graph adapters, execution, and caching — together determine how a DAG authored in Python actually runs and how it is observed.

The framework is "designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack" (Source: [README.md]()). Graph adapters are the primary integration point: they are subclasses of `base.SimplePythonGraphAdapter`/`base.DefaultAdapter` that translate the framework's internal DAG representation into the callables an executor expects (Source: [hamilton/base.py]()).

The execution layer is layered: the user-facing `Driver` (synchronous) and `AsyncDriver` (asynchronous) construct a `FunctionGraph` from collected Python modules, and then dispatch node execution through a pluggable executor (Source: [hamilton/driver.py](), [hamilton/async_driver.py]()). Caching is implemented as a graph adapter that intercepts node lookups, consults a backing store, and short-circuits recomputation when a previously computed value is found (Source: [hamilton/caching/adapter.py]()).

## Graph Adapters

A graph adapter wraps the framework's raw `FunctionGraph` and presents it in a form that the chosen executor can consume. By default, `base.DefaultAdapter` performs a 1:1 mapping from nodes to Python callables, but the abstraction exists so that execution backends (Spark, Dask, Ray, async, etc.) can override the `get_node()`/`node_available()` methods (Source: [hamilton/base.py]()).

```python
from hamilton import base, driver
import data_loading, features

adapter = base.DefaultAdapter()
dr = driver.Driver({}, data_loading, features, adapter=adapter)
```

The lineage example shows the same pattern being used to ask questions of the DAG without ever running it (Source: [examples/lineage/README.md]()). The driver's `visualize_execution()`, `what_is_upstream_of()`, and `visualize_path_between()` methods all rely on the adapter exposing the graph in a queryable form.

## Execution Model

Execution is decoupled from DAG construction. Once the driver has materialized the `FunctionGraph`, it delegates to an executor — a strategy object that knows how to run a set of nodes honoring their dependencies. The default executor walks the graph synchronously; alternative executors cover multi-threading, multi-processing, async, and parallel-task modes (Source: [hamilton/execution/executors.py]()).

Helpers in `hamilton/execution/graph_functions.py` and `hamilton/execution/grouping.py` provide the utilities executors use to slice the graph into task groups, resolve dependencies, and determine which nodes can run in parallel. The `AsyncDriver` reuses the same grouping logic but awaits node coroutines on the event loop (Source: [hamilton/async_driver.py]()).

The community has been actively extending this surface. Issue #1196 requests that `TaskExecutionHook` fire at finer granularity inside task-based parallel DAGs so that progress bars (e.g. with `rich`) can reflect sub-step progress rather than only coarse pre/post hooks (Source: community context, #1196). Work in this area has continued across recent releases, including documentation fixes for parallel-task modes (Source: release notes for `v1.90.0-incubating`).

## Caching

Caching in Hamilton is opt-in and implemented as a graph adapter rather than as executor logic. You attach a caching adapter when building the driver; the adapter overrides the graph's node lookup, checks a backing store for a matching key, and returns the cached value if present (Source: [hamilton/caching/adapter.py]()).

```python
from hamilton import driver, caching
from hamilton.caching.stores.file import FileStore

store = FileStore(path="./cache")
adapters = [caching.CachingAdapter(store=store)]
dr = driver.Driver(config, *modules, adapters=adapters)
```

`FileStore` is the simplest backing implementation, serializing cached values to disk under a hashed key (Source: [hamilton/caching/stores/file.py]()). The store API is intentionally narrow — `get`, `set`, `exists` — so custom stores (S3, Redis, in-memory) can be plugged in by implementing the same interface.

A key design point is that caching adapters compose with execution adapters. The driver passes the resulting `FunctionGraph` to the executor, so cached values short-circuit execution transparently without the executor needing to know about the cache (Source: [hamilton/caching/adapter.py]()).

## Community Considerations

A few community-observed issues are worth noting when working with the execution and caching layers:

- **UI startup regression**: Issue #1457 reports that a fresh `hamilton ui` install fails with `django-ninja` raising `ConfigError: The use of Config class is removed for ModelSchema, use 'Meta' instead`. Until this is resolved, fresh UI installs may not boot out of the box (Source: community context, #1457).
- **Python support**: Release `v1.90.0-incubating` drops Python 3.9 and adds 3.13 support. Code that uses the execution or caching layer should be validated on the supported interpreter range (Source: release notes for `v1.90.0-incubating`).
- **Parallel-task lifecycle**: As noted above, current task-based lifecycle hooks are coarse. Implementers building custom progress reporting for parallel DAGs should expect the API to evolve.

## See Also

- [Driver & DAG Construction](driver-and-dag.md)
- [Decorators and Function Modifiers](decorators-and-modifiers.md)
- [Hamilton UI Overview](hamilton-ui.md)
- [Lineage and Tagging](lineage-and-tagging.md)

---

<a id='page-4'></a>

## Hamilton UI, SDK & Data Tracking

### Related Pages

Related topics: [Function Modifiers, Lifecycle Hooks & Plugins](#page-2), [Graph Adapters, Execution & Caching](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [ui/sdk/src/hamilton_sdk/adapters.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/adapters.py)
- [ui/sdk/src/hamilton_sdk/driver.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/driver.py)
- [ui/sdk/src/hamilton_sdk/tracking/runs.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/tracking/runs.py)
- [ui/backend/README.md](https://github.com/apache/hamilton/blob/main/ui/backend/README.md)
- [ui/backend/server/server/settings.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/server/settings.py)
- [ui/backend/server/trackingserver_base/models.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_base/models.py)
- [ui/backend/server/trackingserver_run_tracking/api.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_run_tracking/api.py)
- [ui/backend/server/trackingserver_projects/api.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_projects/api.py)
- [examples/mlflow/README.md](https://github.com/apache/hamilton/blob/main/examples/mlflow/README.md)
- [examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md)
- [README.md](https://github.com/apache/hamilton/blob/main/README.md)
- [scripts/README.md](https://github.com/apache/hamilton/blob/main/scripts/README.md)
</details>

# Hamilton UI, SDK & Data Tracking

## Overview and Scope

Apache Hamilton ships with an end-to-end observability stack for declarative dataflows. It is composed of two independently versioned packages — `apache-hamilton-sdk` (the client side) and `apache-hamilton-ui` (the server side) — that sit alongside the core `apache-hamilton` library. The SDK ships a tracker adapter that captures run metadata from a Hamilton `Driver` and ships it to the UI server, which then renders DAG visualizations, run history, and a self-service data catalog.

The server's purpose is summarized in [ui/backend/README.md](https://github.com/apache/hamilton/blob/main/ui/backend/README.md) as: "Visualize Hamilton DAGs and their execution history, track inputs, outputs, and runtime metadata for each DAG run, compare runs across versions and configurations" and is designed to be self-hosted either locally or via Docker on port `8242`. The release process documents these as separately versioned artifacts in [scripts/README.md](https://github.com/apache/hamilton/blob/main/scripts/README.md), noting that the `hamilton` core package must be released first, with `sdk`, `ui`, `contrib`, and `lsp` following.

```mermaid
flowchart LR
    A[Hamilton code<br/>user modules] --> B[Driver]
    B --> C[HamiltonTracker<br/>adapter]
    C --> D[Tracking API<br/>POST runs]
    D --> E[(Backend DB<br/>Postgres / SQLite)]
    E --> F[Web UI<br/>DAG catalog]
    F --> G[Browser<br/>localhost:8242]
```

## SDK: The `HamiltonTracker` Adapter

The client side of the tracking stack lives under `ui/sdk/src/hamilton_sdk/` and exposes a `HamiltonTracker` adapter. The recommended integration pattern, as documented in [ui/backend/README.md](https://github.com/apache/hamilton/blob/main/ui/backend/README.md), is:

```python
from hamilton_sdk import adapters
from hamilton import driver

tracker = adapters.HamiltonTracker(
    project_id=PROJECT_ID,
    username=YOUR_EMAIL,
    dag_name="my_dag",
)
dr = (
    driver.Builder()
    .with_config(your_config)
    .with_modules(*your_modules)
    .with_adapters(tracker)
    .build()
)
```

The adapter hooks into the standard Hamilton `Driver` lifecycle — defined in [ui/sdk/src/hamilton_sdk/adapters.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/adapters.py) — and writes structured records to [ui/sdk/src/hamilton_sdk/tracking/runs.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/tracking/runs.py) which are then serialized to the server. The same mechanism is used in the [examples/mlflow/README.md](https://github.com/apache/hamilton/blob/main/examples/mlflow/README.md) integration, where `HamiltonTracker` is paired with the `MLFlowTracker` to combine Hamilton lineage with MLflow experiment tracking.

A `hamilton_sdk` driver builder wrapper is also available via [ui/sdk/src/hamilton_sdk/driver.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/driver.py) for users who prefer a single-call construction over the `driver.Builder` fluent API.

## UI Backend: Server, API, and Data Models

The UI backend is a Django application under `ui/backend/server/`. Configuration is centralized in [ui/backend/server/server/settings.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/server/settings.py). The API is split into Django apps; the two most relevant for tracking are `trackingserver_run_tracking` and `trackingserver_projects`, exposed through [ui/backend/server/trackingserver_run_tracking/api.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_run_tracking/api.py) and [ui/backend/server/trackingserver_projects/api.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_projects/api.py) respectively. The run-tracking endpoints accept the payloads emitted by the SDK and persist them; the projects API manages project metadata and membership.

The persistence layer uses Django ORM models defined in [ui/backend/server/trackingserver_base/models.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/trackingserver_base/models.py). The file declares an abstract base class with `name`, `type`, `schema_version`, and a JSON `value` field, plus a custom `ArrayField` that "acts like an ArrayField when using PostgreSQL and as a serialized TextField when using SQLite" — implemented by overriding `db_type`, `from_db_value`, `to_python`, `get_db_prep_save`, and `get_prep_value`. The design rationale is captured in the module docstring: storing fields relationally (rather than as a single JSON blob) "allows for fine-grained removal of information" when a customer requests deletion of a specific attribute.

## Common Issues and Operational Notes

### Startup failure with `django-ninja` (issue #1457)
A frequent blocker for first-time users is a `django-ninja` `ConfigError` thrown at UI startup: *"The use of `Config` class is removed for ModelSchema, use 'Meta' instead."* This is surfaced when running `hamilton ui` against a fresh install. Workarounds documented in the issue thread include pinning compatible versions of the UI's optional dependencies or upgrading the bundled `django-ninja` to a release that supports the new `Meta`-based schema configuration. Pinning constraints are configured in [ui/backend/server/server/settings.py](https://github.com/apache/hamilton/blob/main/ui/backend/server/server/settings.py) and resolved through the UI's `pyproject.toml` / `requirements.txt`.

### Lifecycle adapters for dynamic / parallel DAGs (issue #1196)
The SDK's lifecycle hooks — surfaced through `HamiltonTracker` and related adapters in [ui/sdk/src/hamilton_sdk/adapters.py](https://github.com/apache/hamilton/blob/main/ui/sdk/src/hamilton_sdk/adapters.py) — were originally designed for static DAGs. For task-based parallel execution, `TaskExecutionHook` only fires *before* a task group, which makes it difficult to render multi-level progress or post-task telemetry. The community is tracking requests to add finer-grained per-task hooks so that adapters like the `rich`-based progress bars described in issue #1196 can report completion correctly.

### Lineage as code
Independent of the UI, the Driver itself can answer lineage questions in-process — for example, `dr.what_is_upstream_of("fit_random_forest")` and `dr.visualize_execution(...)` are demonstrated in [examples/lineage/README.md](https://github.com/apache/hamilton/blob/main/examples/lineage/README.md). The UI augments this with persistent, time-stamped lineage by storing the same information as it is executed, but the in-process API is the source of truth for DAG structure.

### Deployment matrix
The package layout in [scripts/README.md](https://github.com/apache/hamilton/blob/main/scripts/README.md) clarifies that the UI is a separate release artifact (`apache-hamilton-ui` built from `ui/backend`) and that it depends on, but is not a part of, the core `hamilton` package. Representative examples are bundled in the source distribution for end-to-end verification, while the wheel excludes them. Users upgrading between releases should review the per-package `CHANGELOG` rather than relying on the umbrella `apache-hamilton` version.

## See Also

- Hamilton core concepts: nodes, functions, and the Driver
- Adapters and lifecycle hooks (`@check_output`, `SchemaValidator`)
- LLM workflow examples: [Retrieval-Augmented Generation](https://github.com/apache/hamilton/tree/main/examples/LLM_Workflows/retrieval_augmented_generation), [GraphRAG](https://github.com/apache/hamilton/tree/main/examples/LLM_Workflows/GraphRAG), [PDF Summarizer](https://github.com/apache/hamilton/tree/main/examples/LLM_Workflows/pdf_summarizer)
- Ibis integration: [jaffle_shop](https://github.com/apache/hamilton/tree/main/examples/ibis/jaffle_shop), [feature_engineering](https://github.com/apache/hamilton/tree/main/examples/ibis/feature_engineering)
- MLflow tracker integration: [examples/mlflow](https://github.com/apache/hamilton/tree/main/examples/mlflow)
- Lineage patterns: [examples/lineage](https://github.com/apache/hamilton/tree/main/examples/lineage)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: apache/hamilton

Summary: Found 8 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

## 1. Capability evidence risk - Capability evidence risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apache/hamilton/issues/1043

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apache/hamilton/issues/1649

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/apache/hamilton

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/hamilton

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/apache/hamilton

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/apache/hamilton

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/hamilton

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/hamilton

<!-- canonical_name: apache/hamilton; human_manual_source: deepwiki_human_wiki -->