# https://github.com/togethercomputer/together-python Project Manual

Generated at: 2026-05-28 02:57:31 UTC

## Table of Contents

- [Overview](#page-overview)
- [Installation and Setup](#page-installation)
- [Client Architecture](#page-client-architecture)
- [Type System](#page-type-system)
- [Chat Completions](#page-chat-completions)
- [Completions API](#page-completions)
- [Embeddings and Reranking](#page-embeddings-rerank)
- [Image Generation](#page-image-generation)
- [Files API](#page-files-api)
- [Fine-Tuning](#page-finetuning)

<a id='page-overview'></a>

## Overview

### Related Pages

Related topics: [Installation and Setup](#page-installation), [Client Architecture](#page-client-architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [src/together/__init__.py](https://github.com/togethercomputer/together-python/blob/main/src/together/__init__.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)
- [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)
- [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
- [src/together/legacy/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/legacy/finetune.py)
- [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)
</details>

# Overview

The **together-python** repository is an official Python SDK and Command Line Interface (CLI) for interacting with the [Together AI API](https://api.together.xyz/). It provides developers with programmatic access to a wide range of large language models (LLMs), image generation models, embedding services, and fine-tuning capabilities hosted on the Together platform.

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

---

## Purpose and Scope

The together-python SDK serves as the primary integration point for developers who want to incorporate Together AI models into their Python applications or automate workflows via CLI. The library abstracts away the complexity of HTTP request handling, authentication, streaming responses, and error management.

**Core capabilities include:**

| Feature | Description |
|---------|-------------|
| Chat Completions | Multi-turn conversational interactions with LLMs |
| Text Completions | Traditional completion-based text generation |
| Image Generation | Text-to-image generation using diffusion models |
| Embeddings | Text vectorization for semantic search and retrieval |
| Reranking | Document reranking for improved search relevance |
| Fine-tuning | Custom model training on user-provided datasets |
| Function Calling | Tool/function calling for structured LLM interactions |

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

---

## Architecture

The repository follows a layered architecture with clear separation between the client interface, API communication, resource management, and utility functions.

### High-Level Architecture

```mermaid
graph TD
    subgraph "Client Layer"
        CLI[CLI Commands]
        PythonClient[Python Client]
    end
    
    subgraph "Resource Layer"
        Chat[Chat Completions]
        Completions[Text Completions]
        Images[Image Generation]
        Embeddings[Embeddings]
        Rerank[Reranking]
        FineTuning[Fine-tuning]
    end
    
    subgraph "Abstract Core"
        APIRequestor[API Requestor]
        Utils[Utilities]
    end
    
    subgraph "Transport Layer"
        Requests[requests library]
        Aiohttp[aiohttp]
    end
    
    subgraph "External"
        TogetherAPI[Together API]
    end
    
    CLI --> PythonClient
    PythonClient --> Chat
    PythonClient --> Completions
    PythonClient --> Images
    PythonClient --> Embeddings
    PythonClient --> Rerank
    PythonClient --> FineTuning
    
    Chat --> APIRequestor
    Completions --> APIRequestor
    Images --> APIRequestor
    Embeddings --> APIRequestor
    Rerank --> APIRequestor
    FineTuning --> APIRequestor
    
    APIRequestor --> Requests
    APIRequestor --> Aiohttp
    
    Requests --> TogetherAPI
    Aiohttp --> TogetherAPI
```

### Client Initialization

The SDK provides a unified `Together` client class that serves as the entry point for all API operations. The client handles authentication and manages resources for different API capabilities.

```python
from together import Together

# Initialize with API key (recommended via environment variable)
client = Together()

# Or pass API key directly
client = Together(api_key="your-api-key")
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

---

## Directory Structure

```
src/together/
├── __init__.py           # Package entry point and exports
├── error.py              # Exception classes
├── abstract/
│   └── api_requestor.py  # Core HTTP request handling
├── cli/
│   └── api/
│       ├── chat.py       # CLI chat commands
│       ├── completions.py # CLI completion commands
│       └── finetune.py   # CLI fine-tuning commands
├── legacy/
│   └── finetune.py       # Deprecated fine-tuning interface
├── resources/
│   ├── chat/
│   │   └── completions.py # Chat completions implementation
│   └── finetune.py        # Fine-tuning API implementation
├── types/
│   └── error.py          # Pydantic error models
└── utils/
    └── files.py          # File validation utilities
```

Source: [src/together/__init__.py](https://github.com/togethercomputer/together-python/blob/main/src/together/__init__.py)

---

## API Request Flow

The API requestor handles all HTTP communication with the Together API. It supports both synchronous (using `requests`) and asynchronous (using `aiohttp`) operations.

```mermaid
sequenceDiagram
    participant Client as Together Client
    participant Requestor as API Requestor
    participant HTTP as HTTP Library
    participant API as Together API
    
    Client->>Requestor: Make request
    Requestor->>Requestor: Build URL with query params
    Requestor->>Requestor: Apply retries/exponential backoff
    Requestor->>HTTP: Send HTTP request
    HTTP->>API: POST/GET request
    API-->>HTTP: Response
    HTTP-->>Requestor: Raw response
    Requestor->>Requestor: Parse JSON
    Requestor-->>Client: Pydantic model response
```

### Request Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `TOGETHER_API_KEY` | Required | API authentication key |
| `TOGETHER_API_BASE` | `https://api.together.xyz` | Base API URL |
| `TIMEOUT_SECS` | 600 | Request timeout in seconds |
| `MAX_RETRIES` | 10 | Maximum retry attempts |
| `MAX_CONNECTION_RETRIES` | 5 | Connection retry limit |

Source: [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)

---

## Key Features

### Chat Completions

The chat completions endpoint supports multi-turn conversations with support for:

- Text-only messages
- Multi-modal content (text + images)
- System messages
- Function/tool calling
- Streaming responses

```python
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)
```

**Multi-modal example with images:**

```python
response = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.png"
                }
            }
        ]
    }]
)
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Image Generation

Generate images from text prompts using diffusion models:

```python
response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=4,
)
print(response.data[0].b64_json)
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Embeddings

Create vector embeddings for text similarity and retrieval tasks:

```python
outputs = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval',
    input=["Your text here"]
)
embeddings = [outputs.data[i].embedding for i in range(len(outputs.data))]
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Reranking

Improve search relevance with cross-encoder reranking:

```python
outputs = client.rerank.create(
    model="BAAI/bge-reranker",
    query="What is the capital of the United States?",
    documents=["New York", "Washington, D.C.", "Los Angeles"],
    top_n=2
)
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Fine-tuning

Train custom models on your own datasets. The SDK supports both full model fine-tuning and LoRA (Low-Rank Adaptation) training.

**Key fine-tuning parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | str | Base model to fine-tune |
| `training_file` | str | Path to training data file |
| `validation_file` | str | Optional validation data path |
| `n_epochs` | int | Number of training epochs |
| `batch_size` | int | Training batch size |
| `learning_rate` | float | Learning rate |
| `lora` | bool | Whether to use LoRA training |
| `lora_r` | int | LoRA attention dimension |
| `train_on_inputs` | bool/auto | Mask user messages in training |
| `training_method` | str | Training method (dpo, rpo, simpo) |

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)

**Checkpoint management:**

```bash
# Download fine-tuned model weights
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Download specific checkpoint
together fine-tuning download ft-job-id --checkpoint-step 1000
```

Source: [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py)

---

## Error Handling

The SDK defines a comprehensive hierarchy of exception types for different error conditions:

### Exception Hierarchy

```
TogetherException (base)
├── RateLimitError
├── FileTypeError
├── AttributeError
├── Timeout
└── APIConnectionError
```

### Error Response Model

```python
class TogetherErrorResponse(BaseModel):
    message: str | None       # Error message
    type_: str | None         # Error type (alias: "type")
    param: str | None         # Parameter causing error
    code: str | None          # Error code
```

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

Source: [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)

---

## File Validation

The SDK validates uploaded training files to ensure proper format. Supported content types include:

| Content Type | Description | Validation Rules |
|--------------|-------------|------------------|
| `text` | Plain text content | Must contain `text` field as string |
| `image_url` | Image URL reference | Only allowed in user messages, requires `image_url` dict |

```python
# Example valid content structure
content = [
    {"type": "text", "text": "Sample training text"},
    {
        "type": "image_url",
        "image_url": {"url": "https://example.com/image.png"}
    }
]
```

Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

---

## CLI Interface

The SDK includes a comprehensive CLI for interactive use and automation:

```bash
# Chat completion
together chat.completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message user "Hello, world!"

# Text completion
together completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --prompt "Once upon a time"

# Fine-tuning
together fine-tuning create --help
together fine-tuning list
together fine-tuning download ft-job-id
```

Source: [src/together/cli/api/chat.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/chat.py)

Source: [src/together/cli/api/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/completions.py)

---

## Known Issues and Limitations

Be aware of the following known issues when using this SDK:

### Dependency Conflicts

**typer Version Constraint** (Issue #348)
The SDK depends on `typer<0.16.0`, which may conflict with other packages requiring newer versions. This can cause import errors in projects that have strict version requirements.

**Pillow Version** (Issue #237)
The current pillow dependency may cause transitive dependency issues when used alongside packages like autogen `0.4.2` which require `pillow>=11.0.0`.

### Missing Model Types

**transcribe ModelType** (Issue #337)
The `ModelType` enum is missing the "transcribe" type, which may cause Pydantic validation errors when working with transcription models.

### Function Calling Limitations

**Tool Response Messages** (Issue #113)
Messages with `role='tool'` in multi-turn function calling scenarios may not be properly accepted by the API, requiring workarounds for certain use cases.

---

## Legacy API Support

The SDK maintains backward compatibility with the legacy fine-tuning API through the `together.legacy.finetune` module:

```python
from together.legacy.finetune import TogetherFineTuning

# Legacy method (deprecated)
result = TogetherFineTuning.create(
    training_file="path/to/file.jsonl",
    model="meta-llama/Llama-3-8b",
    n_epochs=3,
)
```

Source: [src/together/legacy/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/legacy/finetune.py)

---

## Development Setup

To contribute to the SDK or run it from source:

```bash
# Install Poetry (v1.6.1+)
curl -sSL https://install.python-poetry.org | python3 -

# Install development dependencies
poetry install --with quality,tests

# Set up pre-commit hooks
pre-commit install

# Run tests
make tests

# Run formatting
make format
```

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

---

## See Also

- [Together AI Documentation](https://docs.together.ai/)
- [API Reference](https://api.together.xyz/docs)
- [Function Calling Guide](https://docs.together.ai/docs/function-calling)
- [Fine-tuning Documentation](https://docs.together.ai/docs/fine-tuning)
- [GitHub Issues](https://github.com/togethercomputer/together-python/issues)

---

<a id='page-installation'></a>

## Installation and Setup

### Related Pages

Related topics: [Overview](#page-overview)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [pyproject.toml](https://github.com/togethercomputer/together-python/blob/main/pyproject.toml)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)
- [src/together/constants.py](https://github.com/togethercomputer/together-python/blob/main/src/together/constants.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)
</details>

# Installation and Setup

This page covers the complete installation and setup process for the Together Python SDK (`together-python`), including prerequisites, configuration options, CLI setup, and development environment configuration.

## Overview

The `together-python` SDK provides a Python interface and command-line tool for interacting with the Together AI API. It enables developers to:

- Access chat completions with support for multimodal inputs (text and images)
- Generate text completions
- Create and manage fine-tuning jobs
- Generate images
- Compute embeddings and reranking
- Manage files and model resources

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## Prerequisites

### Python Version Requirements

The SDK requires **Python 3.10 or higher**. The project uses modern Python features including type hints and async/await patterns.

### API Key

A valid Together AI API key is required for all API operations. You can obtain an API key by:

1. Creating an account at [api.together.ai](https://api.together.ai)
2. Navigating to the [API keys settings page](https://api.together.xyz/settings/api-keys)

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## Installation Methods

### Standard Installation (pip)

Install the latest stable release from PyPI:

```bash
pip install together
```

### Poetry Installation

For projects using Poetry as the dependency manager:

```bash
poetry add together
```

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

### Development Installation

For contributors who want to modify the source code or run tests locally:

```bash
# Clone the repository
git clone https://github.com/togethercomputer/together-python.git
cd together-python

# Install with development dependencies
poetry install --with quality,tests
```

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

## Configuration

### Environment Variables

The SDK supports configuration through environment variables. The primary variable required is:

| Environment Variable | Description | Required |
|---------------------|-------------|----------|
| `TOGETHER_API_KEY` | Your Together AI API key | Yes |

#### Setting the API Key

**Unix/Linux/macOS:**

```bash
export TOGETHER_API_KEY=xxxxx
```

**Windows (Command Prompt):**

```cmd
set TOGETHER_API_KEY=xxxxx
```

**Windows (PowerShell):**

```powershell
$env:TOGETHER_API_KEY="xxxxx"
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Client Configuration

The Python client can be initialized with or without an explicit API key:

**Using environment variable (recommended):**

```python
from together import Together

client = Together()  # Automatically reads TOGETHER_API_KEY
```

**Explicit API key:**

```python
from together import Together

client = Together(api_key="your-api-key-here")
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## Optional Dependencies

The SDK uses Poetry for dependency management. Some features require optional dependencies:

| Extra | Purpose | Install Command |
|-------|---------|------------------|
| `extended_testing` | Additional testing dependencies | `poetry install --with extended_testing` |

When adding new dependencies, maintainers follow a strict policy: dependencies should be optional and users who don't have them installed should be able to import the SDK without warnings or errors.

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

## CLI Setup

### Installation

The CLI is included with the main package installation. After installing `together`, the `together` command becomes available.

### Verification

Verify the CLI installation:

```bash
together --help
```

### Common CLI Commands

| Command | Description |
|---------|-------------|
| `together chat` | Chat completions |
| `together completions` | Text completions |
| `together images` | Image generation |
| `together files` | File management |
| `together fine-tuning` | Fine-tuning operations |
| `together models` | List and manage models |

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## Client Initialization Patterns

### Synchronous Client

```python
from together import Together

client = Together()
```

### Asynchronous Client

```python
from together import AsyncTogether

async_client = AsyncTogether()
```

### Basic Usage Flow

```mermaid
graph TD
    A[Install together package] --> B[Set TOGETHER_API_KEY]
    B --> C[Import Together or AsyncTogether]
    C --> D[Initialize client]
    D --> E[Call API methods]
    E --> F[Process response]
```

## SDK Constants

The SDK defines several constants in `src/together/constants.py`:

| Constant | Purpose |
|----------|---------|
| API base URLs | Endpoint configurations |
| Default timeouts | Request timeout values |
| Version information | SDK version tracking |

Source: [src/together/constants.py](https://github.com/togethercomputer/together-python/blob/main/src/together/constants.py)

## Error Handling Setup

The SDK provides a comprehensive error hierarchy for handling API-related issues:

### Exception Types

| Exception Class | Purpose |
|----------------|---------|
| `TogetherException` | Base exception class |
| `RateLimitError` | Handle rate limiting |
| `FileTypeError` | File format validation errors |
| `APIConnectionError` | Network connectivity issues |
| `Timeout` | Request timeout handling |
| `AuthenticationError` | Invalid API key errors |

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

### Error Response Format

API errors are returned with structured information:

```python
class TogetherErrorResponse(BaseModel):
    message: str | None = None      # Error message
    type: str | None = None         # Error type
    param: str | None = None        # Parameter causing error
    code: str | None = None         # Error code
```

Source: [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)

### Error Handling Example

```python
from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"API error: {e}")
```

## Known Compatibility Issues

### Typer Version Conflict

> **Note:** The SDK has a dependency constraint on `typer<0.16.0`. If your project requires `typer>=0.16.0`, you may encounter dependency conflicts. See [Issue #348](https://github.com/togethercomputer/together-python/issues/348) for tracking.

This is a known community issue where projects depending on newer typer versions cannot use together-python without resolving the conflict.

### Pillow Version

> **Note:** The SDK's image processing may have transitive dependency issues with `pillow>=11.0.0` when used alongside libraries like autogen `0.4.2`. See [Issue #237](https://github.com/togethercomputer/together-python/issues/237) for details.

## Development Environment Setup

### 1. Install Poetry

Follow the [official Poetry installation guide](https://python-poetry.org/docs/#installation).

> **Important:** If you use Conda or Pyenv, create and activate a new environment before installing Poetry:
> ```bash
> conda create -n together python=3.10
> conda activate together
> ```

### 2. Configure Poetry

Tell Poetry to use the active Python environment:

```bash
poetry config virtualenvs.prefer-active-python true
```

### 3. Install Dependencies

```bash
poetry install --with quality,tests
```

### 4. Set Up Pre-commit Hooks

The project uses pre-commit for auto-formatting and linting:

```bash
pre-commit install
```

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

### Running Tests

#### Unit Tests

```bash
make tests
```

#### Integration Tests

> **Warning:** Integration tests require an active API key and will incur charges.

```bash
make integration_tests
```

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

## Formatting and Linting

Before submitting changes, run formatting locally:

```bash
make format
```

The CI system automatically checks formatting, linting, and tests.

Source: [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)

## Quick Start Checklist

| Step | Task | Command/Action |
|------|------|----------------|
| 1 | Check Python version | `python --version` (requires 3.10+) |
| 2 | Install SDK | `pip install together` |
| 3 | Set API key | `export TOGETHER_API_KEY=xxxxx` |
| 4 | Verify installation | `python -c "from together import Together; print('OK')"` |
| 5 | Test basic call | Run a simple chat completion |

## See Also

- [Chat Completions](chat-completions) - Using the chat API
- [Fine-tuning](fine-tuning) - Training custom models
- [Image Generation](image-generation) - Creating images
- [Error Handling](../error-handling) - Handling API errors
- [Contributing Guide](../contributing) - Development contribution guidelines

---

<a id='page-client-architecture'></a>

## Client Architecture

### Related Pages

Related topics: [Chat Completions](#page-chat-completions), [Type System](#page-type-system)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/client.py](https://github.com/togethercomputer/together-python/blob/main/src/together/client.py)
- [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)
- [src/together/resources/__init__.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/__init__.py)
- [src/together/resources/chat/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)
- [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
</details>

# Client Architecture

## Overview

The Together Python SDK provides a unified interface for interacting with the Together AI platform through both a programmatic Python client and a command-line interface (CLI). The client architecture follows a layered design pattern that separates concerns between API communication, resource management, and user-facing interfaces.

The architecture is designed to support multiple API capabilities including chat completions, text completions, embeddings, image generation, file management, and fine-tuning operations. Source: [src/together/resources/__init__.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/__init__.py)

## Core Components

The SDK architecture consists of three primary layers that work together to provide a seamless developer experience:

```mermaid
graph TD
    A[User Application] --> B[Together Client]
    B --> C[Resource Modules]
    C --> D[API Requestor]
    D --> E[Together AI API]
    E --> D
    D --> B
    B --> A
    
    F[CLI Commands] --> B
    
    subgraph Resources
        G[Chat Completions]
        H[Completions]
        I[Embeddings]
        J[Images]
        K[Files]
        L[Fine-tuning]
    end
    
    C --> G
    C --> H
    C --> I
    C --> J
    C --> K
    C --> L
```

### Together Client Class

The `Together` class serves as the main entry point for the SDK. It provides a synchronous interface for all API operations and manages the underlying HTTP client configuration.

**Key Responsibilities:**

- Initialization and configuration of API credentials
- Delegation of requests to appropriate resource modules
- Streaming response handling
- Timeout and connection management

Source: [src/together/client.py](https://github.com/togethercomputer/together-python/blob/main/src/together/client.py)

**Basic Initialization:**

```python
from together import Together

# Using environment variable (TOGETHER_API_KEY)
client = Together()

# Explicit API key
client = Together(api_key="your-api-key-here")
```

### API Requestor

The `APIRequestor` class handles the low-level communication with the Together AI API. It abstracts away HTTP details and provides a consistent interface for both synchronous and asynchronous operations.

**Requestor Responsibilities:**

- Constructing HTTP requests with proper authentication headers
- Handling request serialization and response parsing
- Managing streaming responses
- Implementing retry logic for transient failures
- Processing error responses into typed exceptions

Source: [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)

### Resource Modules

Resource modules encapsulate API operations by domain. Each resource module provides type-safe methods for a specific category of API endpoints.

| Resource Module | Purpose | Key Methods |
|-----------------|---------|-------------|
| `chat.completions` | Chat-based language model interactions | `create()`, streaming variants |
| `completions` | Text completion operations | `create()`, streaming variants |
| `embeddings` | Text embedding generation | `create()` |
| `images` | Image generation | `generate()` |
| `files` | File upload, retrieval, and management | `upload()`, `retrieve()`, `list()`, `delete()` |
| `fine_tuning` | Model fine-tuning operations | `create()`, `retrieve()`, `list()`, `cancel()`, `download()` |

Source: [src/together/resources/__init__.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/__init__.py)

## Request/Response Flow

### Synchronous Request Flow

```mermaid
sequenceDiagram
    participant App as Application Code
    participant Client as Together Client
    participant Resource as Resource Module
    participant Requestor as API Requestor
    participant API as Together AI API

    App->>Client: client.chat.completions.create(...)
    Client->>Resource: delegating request
    Resource->>Resource: build request parameters
    Resource->>Requestor: request()
    Requestor->>API: POST /chat/completions
    API-->>Requestor: JSON Response
    Requestor->>Resource: parse response
    Resource-->>Client: typed response object
    Client-->>App: ChatCompletionResponse
```

### Streaming Response Handling

The SDK supports server-sent events (SSE) streaming for real-time token delivery. Streaming is handled differently depending on the API endpoint:

**Chat Completions Streaming:**

```python
from together import Together

client = Together()
stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
```

Source: [src/together/resources/chat/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

The streaming implementation yields `ChatCompletionChunk` objects asynchronously when iterating over the response stream.

### Asynchronous Support

The SDK provides `AsyncTogether` for applications requiring concurrent API operations:

```python
import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def concurrent_requests():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Prompt {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    return responses
```

## Error Handling

The SDK defines a hierarchy of exception types for different error conditions, enabling precise error handling in application code.

### Exception Hierarchy

```
TogetherException (base)
├── RateLimitError
├── FileTypeError
├── AttributeError
├── Timeout
├── APIConnectionError
```

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

### Error Response Model

API error responses are parsed into structured `TogetherErrorResponse` objects:

| Field | Type | Description |
|-------|------|-------------|
| `message` | `str \| None` | Human-readable error message |
| `type` | `str \| None` | Error category/type |
| `param` | `str \| None` | Parameter that caused the error |
| `code` | `str \| None` | Machine-readable error code |

Source: [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)

### Error Handling Example

```python
from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
except TogetherException as e:
    print(f"API error: {e.message}")
```

## File Validation Architecture

The SDK includes robust file validation for fine-tuning datasets, ensuring data integrity before upload.

```mermaid
graph LR
    A[Input File] --> B{File Type Check}
    B -->|JSONL| C[JSONL Validator]
    B -->|JSON| D[JSON Validator]
    C --> E{Content Validation}
    D --> E
    E --> F[Schema Validation]
    F --> G[Size Limits Check]
    G --> H[Upload Ready]
    E -->|Invalid| I[InvalidFileFormatError]
```

### Validation Rules

The file validation system enforces the following constraints:

| Rule | Limit | Description |
|------|-------|-------------|
| Maximum base64 image size | 10MB | Per image in multimodal datasets |
| Maximum images per example | 5 | Images allowed in a single training example |
| Required fields | `type`, `content` | For each message in multimodal format |

Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

### Supported Content Types

| Type | Description | Role Restrictions |
|------|-------------|-------------------|
| `text` | Plain text content | Any role |
| `image_url` | Base64-encoded image | User role only |

## Fine-tuning Architecture

The fine-tuning module provides comprehensive support for training custom models on the Together platform.

### Training Methods

The SDK supports multiple fine-tuning methodologies:

| Method | Description | Checkpoint Types |
|--------|-------------|------------------|
| Full training | Updates all model weights | Default only |
| LoRA | Low-rank adaptation | Default, Merged, Adapter |
| DPO | Direct Preference Optimization | Default |
| SimPO | Simple Preference Optimization | Default |
| RPO | Reward Preference Optimization | Default |

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)

### Checkpoint Management

The fine-tuning resource handles checkpoint retrieval and download:

```python
# List available checkpoints
checkpoints = client.fine_tuning.retrieve_checkpoints(fine_tune_id)

# Download specific checkpoint
result = client.fine_tuning.download(
    fine_tune_id,
    output="./checkpoints",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED
)
```

## CLI Architecture

The command-line interface is built using Click and mirrors the Python client functionality.

### CLI Command Structure

```
together
├── chat completions
├── completions
├── embeddings
├── files
│   ├── check
│   ├── upload
│   ├── list
│   ├── retrieve
│   └── delete
├── fine-tuning
│   ├── create
│   ├── list
│   ├── retrieve
│   ├── cancel
│   ├── download
│   └── delete
└── models
    ├── list
    └── start
```

Source: [src/together/cli/api/chat.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/chat.py) and [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py)

### CLI Configuration

The CLI supports environment variable configuration:

```bash
# Set API key
export TOGETHER_API_KEY=your-api-key

# Use CLI
together chat completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, world!"
```

## Known Issues and Limitations

### Dependency Compatibility

**Issue #348:** The SDK has a dependency constraint on `typer<0.16.0`, which may conflict with projects requiring newer versions of typer. This can cause dependency resolution failures in environments where multiple packages have conflicting typer requirements.

**Issue #237:** The `pillow` dependency version may conflict with transitive dependencies from other packages like `autogen>=0.4.2` that require `pillow>=11.0.0`.

### Model Type Validation

**Issue #337:** The `ModelObject` type definition may not include all valid model types, potentially causing Pydantic validation errors when working with newer or specialized model types like transcription models.

### Tool Response Handling

**Issue #113:** Multi-turn function calling workflows may encounter validation errors when processing tool response messages with `role='tool'`. Applications implementing function calling should ensure proper message formatting according to the Together AI API specification.

## Best Practices

### Connection Management

- Reuse the `Together` client instance across multiple requests to benefit from connection pooling
- Set appropriate timeout values for long-running operations like fine-tuning

### Error Recovery

- Implement exponential backoff for `RateLimitError` handling
- Validate file contents locally before upload to avoid wasted API calls

### Streaming Performance

- Process streaming chunks incrementally rather than buffering entire responses
- Use async variants (`AsyncTogether`) for applications making multiple concurrent requests

## See Also

- [Chat Completions Guide](chat-completions)
- [Fine-tuning Guide](fine-tuning)
- [CLI Reference](cli-reference)
- [Error Handling](error-handling)
- [Together AI Documentation](https://docs.together.ai/)

---

<a id='page-type-system'></a>

## Type System

### Related Pages

Related topics: [Client Architecture](#page-client-architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/types/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/error.py)
- [src/together/types/abstract.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/abstract.py)
- [src/together/types/common.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/common.py)
- [src/together/resources/chat/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
</details>

# Type System

The together-python SDK employs a comprehensive type system built on Pydantic for data validation, serialization, and API interaction. This document provides a detailed reference for developers working with the SDK's type definitions, error handling, and validation patterns.

## Overview

The type system serves three primary purposes within the together-python SDK:

1. **Data Validation**: Ensures API request parameters meet expected formats before transmission
2. **Serialization**: Converts Python objects to JSON for API communication and deserializes responses
3. **IDE Support**: Provides type hints for better developer experience and autocomplete

```mermaid
graph TD
    A[User Code] --> B[Pydantic Models]
    B --> C{Validation}
    C -->|Pass| D[API Request]
    C -->|Fail| E[Validation Error]
    D --> F[API Response]
    F --> G[Response Models]
    G --> H[User Code]
```

## Base Types

### Abstract Base Model

All SDK types inherit from `BaseModel`, which extends Pydantic's `BaseModel` with custom configuration:

```python
# Source: src/together/types/abstract.py
class BaseModel(BaseModel):
    """Base model for all Together API types."""
    
    model_config = ConfigDict(
        populate_by_name=True,
        validate_default=True,
        arbitrary_types_allowed=True,
    )
```

The `BaseModel` configures:
- `populate_by_name=True`: Allows population by field name or alias
- `validate_default=True`: Validates default values during initialization
- `arbitrary_types_allowed=True`: Permits custom type annotations

### Error Response Model

The `TogetherErrorResponse` type defines the structure for API error responses:

| Field | Type | Description |
|-------|------|-------------|
| `message` | `str \| None` | Human-readable error message |
| `type` | `str \| None` | Error category/type |
| `param` | `str \| None` | Parameter that caused the error |
| `code` | `str \| None` | Machine-readable error code |

```python
# Source: src/together/types/error.py
class TogetherErrorResponse(BaseModel):
    message: str | None = None
    type_: str | None = Field(None, alias="type")
    param: str | None = None
    code: str | None = None
```

## Exception Hierarchy

The SDK defines a hierarchical exception system for granular error handling:

```mermaid
graph TD
    A[TogetherException<br/>Base Exception] --> B[RateLimitError]
    A --> C[FileTypeError]
    A --> D[AttributeError]
    A --> E[Timeout]
    A --> F[APIConnectionError]
    A --> G[InvalidRequestError]
    A --> H[AuthenticationError]
    A --> I[APIResponseError]
```

### Exception Types

| Exception Class | Purpose | Common Cause |
|-----------------|---------|--------------|
| `TogetherException` | Base exception for all SDK errors | General failures |
| `RateLimitError` | API rate limit exceeded | Too many requests |
| `FileTypeError` | Invalid file type submitted | Unsupported file format |
| `AttributeError` | Invalid attribute access | Missing or invalid parameter |
| `Timeout` | Request timeout | Slow network or API |
| `APIConnectionError` | Network connectivity issue | Connection failure |
| `InvalidRequestError` | Malformed request | Invalid parameters |
| `AuthenticationError` | Authentication failure | Invalid API key |
| `APIResponseError` | Unexpected API response | Server-side error |

```python
# Source: src/together/error.py
class RateLimitError(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)
```

### Exception Construction Pattern

All exception types accept flexible message parameters:

```python
# Source: src/together/error.py
class Timeout(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)
```

The message can be:
- `TogetherErrorResponse`: Parsed API error response
- `Exception`: Wrapped exception
- `str`: Direct error message
- `RequestException`: HTTP request exception

## Request and Response Types

### Chat Completions Types

The chat completions system uses structured types for requests and responses:

```python
# Source: src/together/resources/chat/completions.py
response, _, _ = await requestor.arequest(
    options=TogetherRequest(
        method="POST",
        url="chat/completions",
        params=parameter_payload,
    ),
    stream=stream,
)

if stream:
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)
```

### Streaming Response Types

Streaming responses yield `ChatCompletionChunk` objects:

| Field | Type | Description |
|-------|------|-------------|
| `choices` | `List[Choice]` | Generated completions |
| `model` | `str` | Model identifier |
| `id` | `str` | Request identifier |
| `usage` | `Usage` | Token usage statistics |

## File Validation Types

### Content Item Types

The SDK validates file content for fine-tuning datasets:

```python
# Source: src/together/utils/files.py
if item["type"] == "text":
    if "text" not in item or not isinstance(item["text"], str):
        raise InvalidFileFormatError(
            "The dataset is malformed, the `text` field must be present in the `content` item field and be"
            f" a string. Got '{item.get('text')!r}' instead.",
            line_number=idx + 1,
            error_source="key_value",
        )
elif item["type"] == "image_url":
    if role != "user":
        raise InvalidFileFormatError(
            "The dataset is malformed, only user messages can contain images.",
            line_number=idx + 1,
            error_source="key_value",
        )
```

### Content Type Enumeration

| Type | Valid Context | Description |
|------|---------------|-------------|
| `text` | Any role | Plain text content |
| `image_url` | User role only | Image URL reference |

## Common Issues and Troubleshooting

### Validation Errors

Pydantic validation errors occur when request data doesn't match expected types:

```
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelObject
type
  Input should be 'chat', 'language', 'code', 'image', 'embedding',...
```

**Resolution**: Ensure model names are valid and match available models in the Together ecosystem. Use `client.models.list()` to verify available models.

### Invalid File Format Errors

When uploading fine-tuning datasets, content validation enforces strict rules:

```python
# Source: src/together/utils/files.py
if not isinstance(item, dict):
    raise InvalidFileFormatError(
        "The dataset is malformed, the `content` field must be a list of dicts.",
        line_number=idx + 1,
        error_source="key_value",
    )
```

### Type Mismatch in Streaming

When processing streaming responses, type assertions ensure correct handling:

```python
# Source: src/together/cli/api/completions.py
if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices
```

## Type Annotations in CLI

The CLI uses Click decorators with type annotations for command-line argument validation:

```python
# Source: src/together/cli/api/chat.py
@click.option(
    "--max-tokens",
    type=int,
    help="Max tokens to generate"
)
@click.option(
    "--temperature",
    type=float,
    help="Sampling temperature"
)
@click.option(
    "--stop",
    type=str,
    multiple=True,
    help="List of strings to stop generation"
)
```

### CLI Type Conversion

| CLI Option Type | Python Type | Notes |
|------------------|-------------|-------|
| `type=int` | `int` | Integer values |
| `type=float` | `float` | Decimal values |
| `type=str` | `str` | String values |
| `multiple=True` | `tuple` | Multiple values |
| `is_flag=True` | `bool` | Boolean flags |

## Async Type Handling

The SDK provides async variants of response types:

```python
# Source: src/together/resources/chat/completions.py
if stream:
    assert not isinstance(response, TogetherResponse)
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)
```

## Best Practices

### Type Safety Guidelines

1. **Use Response Models**: Always use SDK response models instead of raw dictionaries
2. **Validate Early**: Check input types before API calls
3. **Handle Exceptions**: Catch specific exception types for targeted error handling
4. **Use Type Hints**: Enable IDE autocomplete with proper imports

### Importing Types

```python
from together.types.error import TogetherErrorResponse
from together.error import (
    TogetherException,
    RateLimitError,
    InvalidRequestError,
    Timeout,
)
```

## See Also

- [API Reference](https://github.com/togethercomputer/together-python)
- [Fine-tuning Guide](fine-tuning.md)
- [Chat Completions](chat-completions.md)
- [CLI Usage](cli-usage.md)

---

<a id='page-chat-completions'></a>

## Chat Completions

### Related Pages

Related topics: [Completions API](#page-completions), [Client Architecture](#page-client-architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/resources/chat/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)
- [src/together/cli/api/chat.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/chat.py)
- [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
</details>

# Chat Completions

The Chat Completions API provides a unified interface for interacting with large language models on the Together platform through conversational message-based interactions. This feature supports text-only and multimodal inputs, streaming responses, function calling, and various generation parameters to control model behavior.

## Overview

The Chat Completions resource is the primary interface for conversational AI interactions in the together-python SDK. It follows the OpenAI-compatible chat completions format, enabling developers to switch between providers with minimal code changes while leveraging Together's distributed inference infrastructure.

```mermaid
graph TD
    A[Client Application] --> B[Together Client]
    B --> C[Chat Completions.create]
    C --> D[API Requestor]
    D --> E[Together API]
    E --> F[Model Inference]
    F --> G[Response]
    G --> D
    D --> B
    B --> H[ChatCompletionResponse]
    
    style A fill:#e1f5fe
    style H fill:#c8e6c9
```

**Key capabilities include:**

- **Text Completions**: Standard conversational text generation with system, user, and assistant roles
- **Multimodal Input**: Support for images alongside text in user messages
- **Streaming**: Real-time token-by-token response streaming
- **Function Calling**: Tool-use with structured function definitions and responses
- **Safety Controls**: Built-in moderation model integration
- **Audio Support**: Attach audio URLs to messages for Whisper-transcribed context

Source: [src/together/resources/chat/completions.py:1-50](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

## Installation and Setup

### Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the [Together Playground settings page](https://api.together.xyz/settings/api-keys).

```bash
export TOGETHER_API_KEY=your_api_key_here
```

### Client Initialization

```python
from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="your_api_key_here")

# Custom base URL for testing
client = Together(
    api_key="your_api_key_here",
    base_url="https://api.together.xyz"
)
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## API Reference

### Method Signature

```python
ChatCompletions.create(
    model: str,
    messages: List[ChatCompletionMessageParam],
    frequency_penalty: Optional[float] = None,
    max_tokens: Optional[int] = None,
    n: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    stop: Optional[Union[str, List[str]]] = None,
    stream: Optional[bool] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    top_k: Optional[int] = None,
    min_p: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    logprobs: Optional[int] = None,
    echo: Optional[bool] = None,
    safety_model: Optional[str] = None,
    response_format: Optional[ResponseFormat] = None,
    tools: Optional[List[ChatCompletionToolParam]] = None,
    tool_choice: Optional[Union[ChatCompletionToolChoiceEnum, ChatCompletionNamedToolChoiceParam]] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    max_completion_tokens: Optional[int] = None,
) -> ChatCompletionResponse
```

Source: [src/together/resources/chat/completions.py:1-50](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

### Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model` | `str` | Yes | - | Model identifier (e.g., `meta-llama/Llama-4-Scout-17B-16E-Instruct`) |
| `messages` | `List[ChatCompletionMessageParam]` | Yes | - | List of conversation messages with roles |
| `temperature` | `float` | No | `0.7` | Sampling temperature (0.0-2.0) |
| `top_p` | `float` | No | `1.0` | Nucleus sampling threshold |
| `top_k` | `int` | No | - | Top-k token selection |
| `min_p` | `float` | No | - | Minimum probability threshold |
| `max_tokens` | `int` | No | `256` | Maximum tokens to generate |
| `max_completion_tokens` | `int` | No | - | Alternative to max_tokens |
| `stream` | `bool` | No | `False` | Enable streaming response |
| `stop` | `str` or `List[str]` | No | - | Stop sequences |
| `n` | `int` | No | `1` | Number of completions to generate |
| `presence_penalty` | `float` | No | `0.0` | Penalize repeated tokens |
| `frequency_penalty` | `float` | No | `0.0` | Penalize frequent tokens |
| `repetition_penalty` | `float` | No | `1.0` | Token repetition penalty |
| `logprobs` | `int` | No | - | Return log probabilities |
| `echo` | `bool` | No | `False` | Echo prompt in response |
| `safety_model` | `str` | No | - | Moderation model identifier |
| `response_format` | `ResponseFormat` | No | - | Constrain output format (JSON schema) |
| `tools` | `List[ChatCompletionToolParam]` | No | - | Available function definitions |
| `tool_choice` | `str` or `dict` | No | `"auto"` | Tool selection strategy |
| `audio` | `ChatCompletionAudioParam` | No | - | Audio parameters for voice input |

## Message Format

### Message Roles

The chat completions API supports structured conversation turns through a role-based message system:

| Role | Description | Content Type |
|------|-------------|--------------|
| `system` | Instructions and context | Text only |
| `user` | Human input | Text, images, or mixed |
| `assistant` | Model responses | Text and tool calls |
| `tool` | Function execution results | Text (JSON) |
| `developer` | Developer instructions | Text only |

### Message Structure

```python
from together import Together
from together.types.chat.chat_completion_message_param import ChatCompletionMessageParam

client = Together()

messages: List[ChatCompletionMessageParam] = [
    {
        "role": "system",
        "content": "You are a helpful coding assistant."
    },
    {
        "role": "user", 
        "content": "Write a Python function to calculate factorial."
    }
]

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages
)

print(response.choices[0].message.content)
```

Source: [src/together/resources/chat/completions.py:1-50](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

### Multimodal Messages

User messages can include both text and images using a content array:

```python
response = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.png"
                }
            }
        ]
    }]
)
```

Image URL content items must follow specific validation rules. The `image_url` field must be a dictionary containing a `url` key with a valid URL string. Images are only permitted in user role messages.

Source: [src/together/utils/files.py:1-50](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

## Streaming Responses

The API supports server-sent events (SSE) streaming for real-time token generation:

```python
from together import Together

client = Together()

stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

### Streaming Architecture

```mermaid
sequenceDiagram
    participant Client
    participant APIRequestor
    participant TogetherAPI
    participant Model
    
    Client->>APIRequestor: create(stream=True)
    APIRequestor->>TogetherAPI: POST /chat/completions
    TogetherAPI->>Model: Start inference
    Model-->>TogetherAPI: Token 1
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Model-->>TogetherAPI: Token 2
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Note over Model,Client: Streaming continues...
    Model-->>TogetherAPI: [DONE]
    TogetherAPI-->>APIRequestor: [DONE]
    APIRequestor-->>Client: Iterator ends
```

When streaming is enabled, the method returns an async generator that yields `ChatCompletionChunk` objects. Each chunk contains incremental deltas that should be accumulated to reconstruct the complete response.

Source: [src/together/resources/chat/completions.py:40-80](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

## Function Calling

Function calling enables models to invoke predefined tools with structured outputs. This follows the OpenAI function calling schema.

### Defining Tools

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]
```

### Tool Execution Flow

```mermaid
graph TD
    A[User Query] --> B[Create with tools]
    B --> C{Model selects tool?}
    C -->|Yes| D[Return tool_call]
    C -->|No| E[Return text response]
    D --> F[Execute function]
    F --> G[tool role message]
    G --> H[Continue with messages]
    H --> B
    E --> I[Final Response]
    
    style D fill:#fff3e0
    style G fill:#e8f5e9
```

### Multi-turn Conversation

After receiving a function call, append the assistant's tool call message and the tool response:

```python
# Initial request with tools
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

assistant_msg = response.choices[0].message
print(f"Tool called: {assistant_msg.tool_calls[0].function.name}")
print(f"Arguments: {assistant_msg.tool_calls[0].function.arguments}")

# Simulate tool execution
tool_result = {"temperature": 22, "conditions": "Sunny"}

# Continue conversation with tool response
messages = [
    {"role": "user", "content": "What's the weather in Paris?"},
    assistant_msg,
    {
        "role": "tool",
        "tool_call_id": assistant_msg.tool_calls[0].id,
        "content": json.dumps(tool_result)
    }
]

final_response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages,
    tools=tools
)
```

> **Note**: There is a known issue (#113) where tool/function response messages with `role='tool'` may encounter validation errors. Ensure the `tool_call_id` matches exactly and the content is valid JSON.

Source: [src/together/resources/chat/completions.py:20-60](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)

## CLI Interface

The Together CLI provides command-line access to chat completions:

```bash
# Basic chat completion
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, how are you?"

# Streaming response
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Write a story"

# With temperature control
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Explain physics" \
    --temperature 0.8
```

### CLI Options

| Option | Type | Description |
|--------|------|-------------|
| `--message` | `(str, str)` multiple | Message as role-content tuple |
| `--model` | `str` | Model identifier (required) |
| `--max-tokens` | `int` | Maximum tokens to generate |
| `--temperature` | `float` | Sampling temperature |
| `--top-p` | `int` | Nucleus sampling |
| `--top-k` | `float` | Top-k sampling |
| `--stop` | `str` multiple | Stop sequences |
| `--repetition-penalty` | `float` | Repetition penalty |
| `--presence-penalty` | `float` | Presence penalty |
| `--frequency-penalty` | `float` | Frequency penalty |
| `--min-p` | `float` | Minimum p sampling |
| `--no-stream` | `flag` | Disable streaming |
| `--safety-model` | `str` | Moderation model |
| `--raw` | `flag` | Return raw JSON |

Source: [src/together/cli/api/chat.py:1-100](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/chat.py)

## Error Handling

The SDK provides structured exception types for different error conditions:

```python
from together import Together
from together.error import (
    TogetherException,
    RateLimitError,
    APIConnectionError,
    Timeout,
    AuthenticationError
)

client = Together()

try:
    response = client.chat.completions.create(
        model="invalid-model-name",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limited: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except TogetherException as e:
    print(f"API error: {e}")
```

### Exception Hierarchy

```mermaid
classDiagram
    class TogetherException {
        +message
    }
    class RateLimitError {
        +message
    }
    class APIConnectionError {
        +message
    }
    class Timeout {
        +message
    }
    class AuthenticationError {
        +message
    }
    class FileTypeError {
        +message
    }
    
    TogetherException <|-- RateLimitError
    TogetherException <|-- APIConnectionError
    TogetherException <|-- Timeout
    TogetherException <|-- FileTypeError
```

### Common Error Codes

| Error Type | Cause | Resolution |
|------------|-------|------------|
| `400 Bad Request` | Invalid parameters | Check message format, model name |
| `401 Unauthorized` | Invalid API key | Verify TOGETHER_API_KEY |
| `429 Too Many Requests` | Rate limit exceeded | Implement exponential backoff |
| `500 Internal Error` | Server error | Retry with backoff |
| `504 Gateway Timeout` | Request timeout | Increase timeout or retry |

Source: [src/together/error.py:1-80](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

## Response Format

### Standard Response

```python
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}]
)

# Access response attributes
print(response.id)           # chatcmpl-xxx
print(response.model)       # meta-llama/Llama-4-Scout-17B-16E-Instruct
print(response.choices[0].message.content)  # Response text
print(response.usage.prompt_tokens)         # Input tokens
print(response.usage.completion_tokens)    # Output tokens
print(response.usage.total_tokens)         # Total tokens
```

### Streaming Chunk

```python
for chunk in stream:
    # ChatCompletionChunk structure
    print(chunk.id)              # Same ID as final response
    print(chunk.choices[0].delta.content)  # Incremental content
    print(chunk.choices[0].finish_reason)  # 'stop' or 'length'
```

## Async Usage

The SDK provides async variants for concurrent operations:

```python
import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def multi_chat():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Query {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    
    for response in responses:
        print(response.choices[0].message.content)

asyncio.run(multi_chat())
```

## Retry Logic and Timeouts

The API requestor implements automatic retry with exponential backoff:

```python
from together.constants import (
    MAX_RETRIES,
    INITIAL_RETRY_DELAY,
    MAX_RETRY_DELAY,
    TIMEOUT_SECS
)

# Default configuration
# MAX_RETRIES: 10
# INITIAL_RETRY_DELAY: 0.5 seconds
# MAX_RETRY_DELAY: 120 seconds
# TIMEOUT_SECS: 600 seconds

# Custom configuration
client = Together(
    max_retries=5,
    timeout=300
)
```

The retry strategy handles:
- Connection timeouts
- 5xx server errors
- Rate limit responses (429)

Source: [src/together/abstract/api_requestor.py:1-100](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)

## Known Limitations

| Issue | Description | Workaround |
|-------|-------------|------------|
| typer version conflict | SDK requires `typer<0.16.0` | Use virtual environments |
| Model type validation | Some model types not recognized | Use model names directly |
| Tool response format | `role='tool'` messages may fail validation | Ensure proper `tool_call_id` and JSON content |

For the most current issues and workarounds, refer to the [GitHub Issues](https://github.com/togethercomputer/together-python/issues).

## Best Practices

1. **Token Management**: Always set `max_tokens` to prevent runaway generation
2. **Error Handling**: Wrap API calls in try-except blocks with appropriate exception handling
3. **Streaming**: Use streaming for better perceived latency on long responses
4. **Context Management**: Keep message lists manageable; trim old messages when对话 exceeds model context
5. **Safety**: Enable `safety_model` for user-facing applications

## See Also

- [Fine-tuning Documentation](./fine-tuning) - Training custom models
- [Embeddings](./embeddings) - Vector representations
- [Image Generation](./image-generation) - Multimodal generation
- [CLI Reference](./cli-reference) - Full CLI documentation
- [Error Handling](./error-handling) - Exception types and recovery

---

<a id='page-completions'></a>

## Completions API

### Related Pages

Related topics: [Chat Completions](#page-chat-completions), [Embeddings and Reranking](#page-embeddings-rerank)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/cli/api/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/completions.py)
- [src/together/resources/chat/completions.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/chat/completions.py)
- [src/together/abstract/api_requestor.py](https://github.com/togethercomputer/together-python/blob/main/src/together/abstract/api_requestor.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
- [CONTRIBUTING.md](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md)
</details>

# Completions API

The Completions API provides access to language model text completion endpoints in the Together AI platform. This API enables developers to generate text completions from various open-source models hosted on Together AI, supporting use cases ranging from code generation to creative writing.

## Overview

The Together Python SDK provides two primary APIs for text generation:

1. **Completions API** - Designed for legacy text completion models and prompt-based generation
2. **Chat Completions API** - Optimized for modern chat-based models with structured message formats

Both APIs support synchronous, asynchronous, and streaming modes of operation.

Source: [README.md:1-50]()

## Installation and Setup

### Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the [Together Playground settings page](https://api.together.xyz/settings/api-keys).

```shell
export TOGETHER_API_KEY=xxxxx
```

### Client Initialization

```python
from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="xxxxx")
```

Source: [README.md:10-20]()

## Usage Patterns

### Synchronous Completion

The synchronous method blocks until the complete response is received:

```python
from together import Together

client = Together()
response = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    max_tokens=200,
)
print(response.choices[0].text)
```

Source: [README.md:80-90]()

### Streaming Completion

Streaming allows real-time response generation by processing chunks as they arrive:

```python
from together import Together

client = Together()
stream = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
```

Source: [README.md:92-103]()

### Asynchronous Completion

The async API enables concurrent requests for improved throughput:

```python
import asyncio
from together import AsyncTogether

async_client = AsyncTogether()
prompts = [
    "Write a Next.js component with TailwindCSS for a header component.",
    "Write a python function for the fibonacci sequence",
]

async def async_completion(prompts):
    tasks = [
        async_client.completions.create(
            model="codellama/CodeLlama-34b-Python-hf",
            prompt=prompt,
        )
        for prompt in prompts
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].text)

asyncio.run(async_completion(prompts))
```

Source: [README.md:105-125]()

## API Parameters

### Core Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model` | string | Yes | Model identifier from the available Together AI models |
| `prompt` | string | Yes | The input prompt for text generation |
| `max_tokens` | integer | No | Maximum number of tokens to generate |
| `temperature` | float | No | Sampling temperature (0.0-2.0, default varies by model) |
| `top_p` | float | No | Nucleus sampling probability threshold |
| `top_k` | integer | No | Top-k sampling parameter |
| `stream` | boolean | No | Enable streaming response (default: false) |
| `n` | integer | No | Number of completions to generate |
| `stop` | string/array | No | Stop sequence(s) to end generation |
| `logprobs` | integer | No | Number of top log probabilities to return |
| `echo` | boolean | No | Echo the prompt in the response |
| `repetition_penalty` | float | No | Penalty for token repetition |
| `presence_penalty` | float | No | Penalize tokens based on presence |
| `frequency_penalty` | float | No | Penalize tokens based on frequency |
| `min_p` | float | No | Minimum probability threshold for sampling |
| `safety_model` | string | No | Moderation model to use |

Source: [src/together/cli/api/completions.py:1-50]()

### Parameter Details

#### Sampling Parameters

- **temperature**: Controls randomness in generation. Lower values (0.1-0.3) produce more deterministic output, while higher values (0.7-1.0) increase creativity.
- **top_p**: Also known as nucleus sampling, controls the cumulative probability mass to consider.
- **top_k**: Limits token selection to the top k most probable tokens.

#### Repetition Control

- **repetition_penalty**: Values > 1.0 discourage repetition, values < 1.0 encourage it.
- **presence_penalty**: Positive values encourage discussing new topics.
- **frequency_penalty**: Positive values reduce repetition of high-frequency tokens.

## CLI Usage

The SDK includes a command-line interface for completions:

```bash
together completions "Your prompt here" --model codellama/CodeLlama-34b-Python-hf
```

### CLI Options

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--model` | -m | Model name | Required |
| `--max-tokens` | -t | Max tokens to generate | None |
| `--temperature` | -T | Sampling temperature | None |
| `--top-p` | -p | Top p sampling | None |
| `--top-k` | -k | Top k sampling | None |
| `--stop` | -s | Stop sequences (multiple allowed) | None |
| `--no-stream` | -ns | Disable streaming | False |
| `--repetition-penalty` | -rp | Repetition penalty | None |
| `--presence-penalty` | -pp | Presence penalty | None |
| `--frequency-penalty` | -fp | Frequency penalty | None |
| `--min-p` | -mp | Minimum p | None |
| `--logprobs` | -l | Return log probabilities | None |
| `--echo` | -e | Echo prompt in response | False |
| `--n` | -n | Number of generations | None |
| `--safety-model` | -sm | Moderation model | None |
| `--raw` | -r | Return raw JSON response | False |

Source: [src/together/cli/api/completions.py:1-75]()

### CLI Streaming Output

When streaming is enabled (default), the CLI processes chunks in real-time:

```python
if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices

        if raw:
            click.echo(f"{json.dumps(chunk.model_dump(exclude_none=True))}")
            continue

        for stream_choice in sorted(chunk.choices, key=lambda c: c.index):
            assert isinstance(stream_choice, CompletionChoicesChunk)
            assert stream_choice.delta
            click.echo(f"{stream_choice.delta.content}", nl=False)
```

Source: [src/together/cli/api/completions.py:45-65]()

## Response Structure

### Completion Response

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique identifier for the completion |
| `choices` | array | Array of completion choices |
| `choices[].text` | string | Generated text content |
| `choices[].index` | integer | Choice index for multiple completions |
| `choices[].finish_reason` | string | Reason for completion ending |
| `model` | string | Model used for generation |
| `usage` | object | Token usage statistics |

### Streaming Chunk Response

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Chunk identifier |
| `choices` | array | Array of delta choices |
| `choices[].delta` | object | Incremental text delta |
| `choices[].delta.content` | string | Delta text content |
| `choices[].index` | integer | Choice index |

## Architecture

### Request Flow

```mermaid
graph TD
    A[Client.completions.create] --> B[Validate Parameters]
    B --> C[APIRequestor]
    C --> D{HTTP Method}
    D -->|POST| E[Send Request to together.ai]
    D -->|Streaming| F[Return Chunk Iterator]
    E --> G[Parse Response]
    G --> H[Return CompletionResponse]
    F --> I[Stream Chunks]
    I --> J[Yield CompletionChunk]
```

### Response Handling

```mermaid
graph TD
    A[API Response] --> B{Streaming Mode?}
    B -->|Yes| C[Return Async Generator]
    B -->|No| D[Return TogetherResponse]
    C --> E[ChatCompletionChunk]
    D --> F[CompletionResponse]
```

## Error Handling

### Exception Types

The SDK defines specific exception types for different error conditions:

| Exception | Description |
|-----------|-------------|
| `TogetherException` | Base exception class |
| `RateLimitError` | API rate limit exceeded |
| `APIConnectionError` | Network connectivity issues |
| `Timeout` | Request timeout |
| `FileTypeError` | Invalid file type |
| `AttributeError` | Invalid attribute access |

Source: [src/together/error.py:1-60]()

### Error Response Structure

```python
class TogetherErrorResponse(BaseModel):
    message: str
    type: str
    code: Optional[str] = None
    param: Optional[str] = None
```

### Common Error Scenarios

1. **Rate Limiting**: When API rate limits are exceeded, the SDK automatically retries with exponential backoff based on configuration.

2. **Timeout**: Configurable timeout with default handling:

```python
# Default timeout is 60 seconds
TIMEOUT_SECS = 60
```

Source: [src/together/abstract/api_requestor.py:20-40]()

3. **Invalid Model**: Returns validation error with available model list

## Configuration Options

### Client Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `api_key` | string | env: TOGETHER_API_KEY | API authentication key |
| `base_url` | string | api.together.ai | API base URL |
| `timeout` | integer | 60 | Request timeout in seconds |
| `max_retries` | integer | 3 | Maximum retry attempts |

### Retry Configuration

```python
MAX_RETRIES = 3
INITIAL_RETRY_DELAY = 0.5  # seconds
MAX_RETRY_DELAY = 2.0  # seconds
MAX_CONNECTION_RETRIES = 2
MAX_SESSION_LIFETIME_SECS = 300
```

Source: [src/together/abstract/api_requestor.py:20-40]()

## Known Limitations and Issues

### Dependency Conflicts

The SDK has a dependency on `typer<0.16.0`, which may cause conflicts with projects requiring newer versions of typer. This is a known issue tracked in [#348](https://github.com/togethercomputer/together-python/issues/348).

### Variable Scope Issue

A known `UnboundLocalError` issue can occur in certain error scenarios when the `result` variable is referenced before assignment. This is being tracked in [#143](https://github.com/togethercomputer/together-python/issues/143).

## Best Practices

### Efficient Usage

1. **Use Streaming for Long Outputs**: When expecting long completions, use streaming to improve perceived latency
2. **Batch Requests with Async**: Use `AsyncTogether` for parallel API calls
3. **Set Appropriate Limits**: Configure `max_tokens` to prevent excessive generation

### Production Considerations

1. **Implement Retry Logic**: The SDK handles retries, but implement additional logic for critical operations
2. **Monitor Token Usage**: Track usage via response `usage` field
3. **Use Safety Models**: Enable moderation for user-facing applications

## See Also

- [Chat Completions API](chat-completions) - Modern chat-based completion API
- [Fine-tuning Guide](fine-tuning) - Training custom models
- [Models Documentation](models) - Available models and selection
- [API Reference](api-reference) - Complete API documentation

---

<a id='page-embeddings-rerank'></a>

## Embeddings and Reranking

### Related Pages

Related topics: [Chat Completions](#page-chat-completions), [Files API](#page-files-api)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/resources/embeddings.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/embeddings.py)
- [src/together/resources/rerank.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/rerank.py)
- [src/together/types/embeddings.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/embeddings.py)
- [src/together/types/rerank.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/rerank.py)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
</details>

# Embeddings and Reranking

The Together Python SDK provides first-class support for text embeddings and document reranking through dedicated resource classes. These features enable semantic search, document retrieval, and information discovery workflows by converting text into dense vector representations and reordering search results based on relevance.

## Overview

Embeddings and reranking are complementary capabilities that power modern retrieval-augmented generation (RAG) and search systems. The SDK exposes these through the `embeddings` and `rerank` namespaces on the main `Together` client, following a consistent pattern with other API resources like chat completions.

```mermaid
graph LR
    A[Text Input] --> B[Embeddings API]
    B --> C[Vector Embeddings]
    C --> D[Reranking API]
    D --> E[Re-ranked Results]
    
    F[Query] --> D
    G[Document Pool] --> D
```

**Key characteristics:**

- Both endpoints use the same `Together` client instance
- Responses are returned as Pydantic model objects for type safety
- Both support synchronous and async patterns via `Together` and `AsyncTogether`
- Input text requires newline normalization for optimal results

## Embeddings

### Purpose and Use Cases

The Embeddings API converts text into high-dimensional vector representations that capture semantic meaning. These vectors can be stored in vector databases and used for similarity search, clustering, or as features for downstream ML tasks.

Common use cases include:

- Semantic search systems
- Document clustering and categorization
- Recommendation systems
- Duplicate detection
- Feature extraction for classification tasks

### Python Client Usage

```python
from typing import List
from together import Together

client = Together()

def get_embeddings(texts: List[str], model: str) -> List[List[float]]:
    # Normalize newlines as recommended by the SDK
    texts = [text.replace("\n", " ") for text in texts]
    
    outputs = client.embeddings.create(model=model, input=texts)
    
    # Extract embedding vectors in order
    return [outputs.data[i].embedding for i in range(len(texts))]

# Example usage
input_texts = ["Our solar system orbits the Milky Way galaxy at about 515,000 mph"]
embeddings = get_embeddings(
    input_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval"
)
print(embeddings)
```

### Embeddings Response Model

The `EmbeddingsCreateResponse` model provides structured access to API responses:

| Field | Type | Description |
|-------|------|-------------|
| `object` | `str` | Object type, typically `"list"` |
| `data` | `List[Embedding]` | List of embedding objects |
| `model` | `str` | Model used for embeddings |
| `usage` | `EmbeddingUsage` | Token usage statistics |

Each `Embedding` object contains:

| Field | Type | Description |
|-------|------|-------------|
| `object` | `str` | Object type, typically `"embedding"` |
| `embedding` | `List[float]` | The embedding vector |
| `index` | `int` | Position in the input list |

The `EmbeddingUsage` object tracks:

| Field | Type | Description |
|-------|------|-------------|
| `prompt_tokens` | `int` | Tokens in the input |
| `total_tokens` | `int` | Total tokens processed |

### API Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model` | `str` | Yes | - | Embedding model identifier |
| `input` | `Union[str, List[str]]` | Yes | - | Text(s) to embed |

### Available Embedding Models

The SDK works with embedding models available on the Together platform. Common models include:

- `togethercomputer/m2-bert-80M-8k-retrieval` - 8K context, 80M parameters
- `togethercomputer/m2-bert-80M-2k-retrieval` - 2K context, 80M parameters

Model availability can be queried using:

```python
models = client.models.list()
# Filter for embedding models
```

## Reranking

### Purpose and Use Cases

The Reranking API takes a query and a set of documents, then returns those documents reordered by relevance to the query. This is particularly valuable when combined with embeddings-based retrieval to refine initial search results.

Common use cases include:

- Improving search result quality after initial embedding-based retrieval
- Multi-stage retrieval pipelines
- Reordering candidates from vector similarity search
- Question answering systems retrieving relevant context

### Python Client Usage

```python
from typing import List
from together import Together

client = Together()

def get_reranked_documents(
    query: str, 
    documents: List[str], 
    model: str, 
    top_n: int = 3
) -> List[str]:
    outputs = client.rerank.create(
        model=model,
        query=query,
        documents=documents,
        top_n=top_n
    )
    
    # Sort by relevance score and return original documents
    return [
        documents[i] 
        for i in sorted(
            [x.index for x in outputs.results], 
            key=lambda x: outputs.results[x].relevance_score, 
            reverse=True
        )
    ]

# Example usage
query = "What is the capital of the United States?"
documents = ["New York", "Washington, D.C.", "Los Angeles"]

reranked = get_reranked_documents(query, documents, top_n=3)
print(reranked)  # ["Washington, D.C.", "New York", "Los Angeles"]
```

### Reranking Response Model

The `RerankResponse` model provides structured access to reranking results:

| Field | Type | Description |
|-------|------|-------------|
| `id` | `str` | Request identifier |
| `results` | `List[Ranking]` | List of ranked documents |
| `meta` | `RerankMeta` | Metadata including model and usage |
| `object` | `str` | Object type |

Each `Ranking` object contains:

| Field | Type | Description |
|-------|------|-------------|
| `index` | `int` | Original document index |
| `relevance_score` | `float` | Relevance score (higher = more relevant) |
| `document` | `Document` | The document object with text |

The `Document` object:

| Field | Type | Description |
|-------|------|-------------|
| `text` | `str` | Document text content |

The `RerankMeta` object:

| Field | Type | Description |
|-------|------|-------------|
| `model_id` | `str` | Model used for reranking |
| `usage` | `RerankUsage` | Token usage statistics |

### API Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model` | `str` | Yes | - | Reranking model identifier |
| `query` | `str` | Yes | - | The query to rank documents against |
| `documents` | `List[str]` | Yes | - | Documents to be ranked |
| `top_n` | `int` | No | `3` | Number of top results to return |
| `max_chunks_per_doc` | `int` | No | `None` | Max chunks per document (model-dependent) |
| `return_documents` | `bool` | No | `True` | Whether to include document text in response |

## Combined Workflow

A typical retrieval pipeline combines embeddings and reranking:

```mermaid
graph TD
    A[User Query] --> B[Embed Query]
    C[Document Corpus] --> D[Embed All Documents]
    B --> E[Vector Similarity Search]
    D --> E
    E --> F[Candidate Documents]
    F --> G[Rerank with Query]
    G --> H[Final Results]
    
    I[Vector Database] <--> D
```

### Complete Example

```python
from typing import List
from together import Together

client = Together()

EMBEDDING_MODEL = "togethercomputer/m2-bert-80M-8k-retrieval"
RERANK_MODEL = "BAAI/bge-reranker"

def semantic_search(
    query: str,
    documents: List[str],
    embedding_model: str = EMBEDDING_MODEL,
    rerank_model: str = RERANK_MODEL,
    top_k: int = 10,
    final_k: int = 3
) -> List[dict]:
    """
    Combined embeddings + reranking search pipeline.
    """
    # Step 1: Embed the query
    query_embedding = client.embeddings.create(
        model=embedding_model,
        input=query.replace("\n", " ")
    ).data[0].embedding
    
    # Step 2: Embed all documents
    doc_embeddings = client.embeddings.create(
        model=embedding_model,
        input=[doc.replace("\n", " ") for doc in documents]
    )
    
    # Step 3: Simple cosine similarity (for demonstration)
    # In production, use a proper vector database
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings.data):
        similarity = sum(q * d for q, d in zip(query_embedding, doc_emb.embedding))
        similarities.append((i, similarity))
    
    # Sort by similarity and take top_k
    similarities.sort(key=lambda x: x[1], reverse=True)
    candidate_indices = [idx for idx, _ in similarities[:top_k]]
    candidate_docs = [documents[i] for i in candidate_indices]
    
    # Step 4: Rerank candidates
    rerank_results = client.rerank.create(
        model=rerank_model,
        query=query,
        documents=candidate_docs,
        top_n=final_k
    )
    
    # Step 5: Extract final results with scores
    results = []
    for result in rerank_results.results:
        results.append({
            "document": result.document.text,
            "relevance_score": result.relevance_score,
            "original_index": result.index
        })
    
    return results

# Usage
query = "machine learning optimization techniques"
corpus = [
    "Gradient descent is a first-order iterative optimization algorithm.",
    "The capital of France is Paris.",
    "Stochastic gradient descent uses random subsets of data.",
    "Climate change affects global weather patterns.",
    "Adam optimizer combines momentum and RMSprop concepts."
]

results = semantic_search(query, corpus)
for r in results:
    print(f"Score: {r['relevance_score']:.4f} - {r['document']}")
```

## Async Usage

Both embeddings and reranking support asynchronous operations:

```python
import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def async_embeddings():
    tasks = [
        async_client.embeddings.create(
            model="togethercomputer/m2-bert-80M-8k-retrieval",
            input=texts
        )
        for texts in batched_documents
    ]
    results = await asyncio.gather(*tasks)
    return results

async def async_rerank():
    return await async_client.rerank.create(
        model="BAAI/bge-reranker",
        query="What is deep learning?",
        documents=["Doc 1", "Doc 2", "Doc 3"],
        top_n=3
    )

# Run
asyncio.run(async_embeddings())
asyncio.run(async_rerank())
```

## CLI Support

The CLI provides commands for embeddings and reranking operations:

```bash
# Embeddings via CLI (using completions with embeddings model)
together completions \
  "Our solar system orbits the Milky Way galaxy" \
  --model togethercomputer/m2-bert-80M-8k-retrieval
```

Note: Direct CLI commands for embeddings may require specific model configurations. For full reranking CLI support, use the Python API.

## Error Handling

Both resources can raise standard Together exceptions defined in `src/together/error.py`:

| Error Type | Description |
|------------|-------------|
| `TogetherException` | Base exception class |
| `RateLimitError` | API rate limit exceeded |
| `APIConnectionError` | Network connectivity issues |

```python
from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.embeddings.create(
        model="togethercomputer/m2-bert-80M-8k-retrieval",
        input="Sample text"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except TogetherException as e:
    print(f"API error: {e}")
```

## Input Text Normalization

The SDK documentation recommends normalizing newline characters in input text:

```python
# Recommended: normalize input text
normalized_texts = [text.replace("\n", " ") for text in texts]

# Create embeddings
response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=normalized_texts
)
```

This normalization helps ensure consistent embedding quality across varied text inputs.

## Known Limitations

Based on community feedback and issue tracking:

1. **Model availability**: Embedding and reranking model availability may vary. Always verify model identifiers against the Together model marketplace.

2. **Batch sizes**: Large batches of documents may require multiple API calls. Consider batching strategies for large document collections.

3. **Token limits**: Both APIs have token limits that may restrict single-request document counts. Monitor `usage` fields in responses.

## See Also

- [Chat Completions](./Chat-Completions.md) - Interactive text generation
- [Fine-tuning](./Fine-tuning.md) - Custom model training
- [Image Generation](./Image-Generation.md) - Image creation capabilities
- [Together API Documentation](https://docs.together.ai/) - Platform-level API reference
- [Contributing Guide](https://github.com/togethercomputer/together-python/blob/main/CONTRIBUTING.md) - SDK contribution guidelines

---

<a id='page-image-generation'></a>

## Image Generation

### Related Pages

Related topics: [Chat Completions](#page-chat-completions)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/resources/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/images.py)
- [src/together/types/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/images.py)
- [src/together/cli/api/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/images.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
</details>

# Image Generation

The Image Generation module in together-python provides programmatic access to Together AI's image synthesis API, enabling developers to generate images from text prompts using state-of-the-art diffusion models. This module supports both synchronous and asynchronous requests, includes a comprehensive CLI interface, and returns images in multiple formats suitable for various downstream applications.

## Overview

The Image Generation feature is part of the Together AI Python SDK that abstracts the complexity of API communication and response parsing. It allows developers to:

- Generate images from text prompts using supported diffusion models
- Configure generation parameters such as dimensions, steps, and seed
- Support negative prompts to guide generation away from unwanted elements
- Return images as Base64-encoded data or URLs
- Integrate seamlessly with other SDK features like chat completions and embeddings

Image generation is accessed through the `client.images` namespace in the main `Together` client, following a consistent pattern used throughout the SDK. Source: [src/together/resources/images.py:1-50](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/images.py)

## Architecture

### Component Overview

The image generation system consists of several interconnected components that work together to provide a unified interface:

```mermaid
graph TD
    A[User Code] --> B[Together Client]
    B --> C[Images Resource]
    C --> D[APIRequestor]
    D --> E[Together API]
    E --> F[ImageResponse]
    F --> G[User Code]
    
    H[CLI Command] --> C
    I[ImageCLI] --> B
    
    J[ImageRequest Type] --> C
    K[ImageResponse Type] --> F
```

### Module Structure

| Component | File Path | Purpose |
|-----------|-----------|---------|
| Images Resource | `src/together/resources/images.py` | Main API client for image generation |
| Image Types | `src/together/types/images.py` | Pydantic models for request/response validation |
| CLI Module | `src/together/cli/api/images.py` | Command-line interface for image generation |
| File Utils | `src/together/utils/files.py` | Helper utilities for file operations |

Source: [src/together/resources/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/images.py)

## API Reference

### Client Method: `client.images.generate()`

The primary method for generating images. Supports both synchronous and asynchronous operation modes.

**Signature:**
```python
async def generate(
    self,
    prompt: str,
    model: str,
    *,
    seed: Optional[int] = None,
    n: int = 1,
    height: int = 1024,
    width: int = 1024,
    negative_prompt: Optional[str] = None,
    **kwargs,
) -> ImageResponse
```

#### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | Required | Text description of the desired image |
| `model` | `str` | Required | Model identifier (e.g., `stabilityai/stable-diffusion-xl-base-1.0`) |
| `seed` | `int` | `None` | Random seed for reproducible generation |
| `n` | `int` | `1` | Number of images to generate |
| `height` | `int` | `1024` | Output image height in pixels |
| `width` | `int` | `1024` | Output image width in pixels |
| `negative_prompt` | `str` | `None` | Prompt describing elements to avoid |
| `**kwargs` | `Any` | N/A | Additional model-specific parameters |

Source: [src/together/resources/images.py:36-60](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/images.py)

#### Returns

| Field | Type | Description |
|-------|------|-------------|
| `data` | `List[ImageChoicesData]` | List of generated image objects |
| `data[0].b64_json` | `str` | Base64-encoded PNG image data |
| `data[0].url` | `str` | Remote URL to the generated image (if available) |
| `data[0].revised_prompt` | `str` | Prompt revised by the model's safety filter |

Source: [src/together/types/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/images.py)

### Response Type: `ImageResponse`

The `ImageResponse` object wraps the API response with additional metadata:

```python
class ImageResponse(TogetherBaseResponse):
    data: List[ImageChoicesData]
```

Source: [src/together/types/images.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/images.py)

## Usage Patterns

### Basic Synchronous Usage

```python
from together import Together

client = Together()

response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=4,
)

# Access base64-encoded images
for image_data in response.data:
    print(image_data.b64_json)

# Access revised prompt (if modified by safety filter)
for image_data in response.data:
    if image_data.revised_prompt:
        print(f"Revised prompt: {image_data.revised_prompt}")
```

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Using Seed for Reproducibility

```python
from together import Together

client = Together()

# Generate with a fixed seed for reproducible results
response = client.images.generate(
    prompt="a serene mountain landscape at sunset",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    seed=42,
    n=1,
    height=768,
    width=768,
)
```

### Multiple Images in Single Request

```python
from together import Together

client = Together()

response = client.images.generate(
    prompt="a bowl of fresh fruit",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Generate 4 variations
    width=512,
    height=512,
)

# Process each generated image
for idx, image_data in enumerate(response.data):
    # Save each image to disk
    import base64
    image_bytes = base64.b64decode(image_data.b64_json)
    with open(f"generated_image_{idx}.png", "wb") as f:
        f.write(image_bytes)
```

## CLI Interface

The Together CLI provides a convenient interface for image generation without writing Python code.

### Command Structure

```bash
together images generate "prompt text" --model <MODEL_NAME> [OPTIONS]
```

Source: [src/together/cli/api/images.py:1-30](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/images.py)

### CLI Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `--model` | `str` | Required | Model name to use for generation |
| `--steps` | `int` | `20` | Number of diffusion steps |
| `--seed` | `int` | `None` | Random seed for reproducibility |
| `--n` | `int` | `1` | Number of images to generate |
| `--height` | `int` | `1024` | Image height in pixels |
| `--width` | `int` | `1024` | Image width in pixels |
| `--negative-prompt` | `str` | `None` | Elements to avoid in generation |
| `--output` | `path` | `.` | Output directory for generated images |
| `--prefix` | `str` | `image-` | Filename prefix for saved images |
| `--no-show` | `flag` | `False` | Do not open images in viewer |

Source: [src/together/cli/api/images.py:31-70](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/images.py)

### CLI Usage Examples

**Basic image generation:**
```bash
together images generate "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --n 4
```

**Custom dimensions with reproducible seed:**
```bash
together images generate "mountain landscape" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --seed 12345 \
  --width 512 \
  --height 768 \
  --steps 30
```

**Save to specific directory without viewing:**
```bash
together images generate "abstract art" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output ./generated_images \
  --prefix "artwork-" \
  --no-show
```

### Image Display Behavior

By default, the CLI automatically opens generated images in the system's default image viewer using the `PIL.Image` library. This behavior can be disabled with the `--no-show` flag. Source: [src/together/cli/api/images.py:70-90](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/images.py)

## Supported Models

The Together AI platform supports various image generation models. The SDK allows any compatible model identifier to be passed directly:

| Model Family | Example Model Identifier | Typical Use |
|--------------|-------------------------|--------------|
| Stable Diffusion XL | `stabilityai/stable-diffusion-xl-base-1.0` | General purpose generation |
| Flux | `black-forest-labs/FLUX.1-dev` | High-quality artistic generation |
| Playground | `playgroundai/playground-v2.5` | Versatile creative work |

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

To list all available image generation models programmatically:

```python
from together import Together

client = Together()
models = client.models.list()

# Filter for image models
image_models = [m for m in models.data if m.type == "image"]
for model in image_models:
    print(f"{model.display_name}: {model.name}")
```

## Request/Response Flow

```mermaid
sequenceDiagram
    participant User
    participant Client
    participant ImagesResource
    participant APIRequestor
    participant TogetherAPI
    participant ImageResponse
    
    User->>Client: client.images.generate(...)
    Client->>ImagesResource: generate(prompt, model, ...)
    ImagesResource->>ImageRequest: Create ImageRequest
    ImagesResource->>APIRequestor: arequest(POST /images/generations)
    APIRequestor->>TogetherAPI: HTTP POST Request
    TogetherAPI-->>APIRequestor: JSON Response
    APIRequestor-->>ImagesResource: TogetherResponse
    ImagesResource->>ImageResponse: Parse response data
    ImageResponse-->>User: ImageResponse with image data
    
    Note over User,TogetherAPI: Base64 images available in response.data[].b64_json
```

Source: [src/together/resources/images.py:40-70](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/images.py)

## Common Issues and Troubleshooting

### Pillow Version Compatibility

Some users have reported transitive dependency conflicts with the `pillow` library. The SDK depends on specific pillow versions for image handling and display features in the CLI. If you encounter conflicts with other packages requiring newer pillow versions, consider using separate virtual environments. Source: [GitHub Issue #237](https://github.com/togethercomputer/together-python/issues/237)

### Large Image Base64 Handling

Generated images are returned as Base64-encoded strings in `b64_json` field. When processing large images or multiple images, ensure your application has sufficient memory available. The SDK does not impose a maximum size limit, but the Together API limits images to approximately 10MB when using base64-encoded format. Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

### API Key Configuration

Image generation requires a valid Together API key. Ensure the `TOGETHER_API_KEY` environment variable is set or passed directly to the client:

```python
# Via environment variable
# export TOGETHER_API_KEY=your_api_key

client = Together()  # Reads from environment

# Or explicitly
client = Together(api_key="your_api_key")
```

### Rate Limiting

Like other API endpoints, image generation is subject to rate limits. If you encounter `RateLimitError`, implement exponential backoff in your application:

```python
import time
from together import Together
from together.error import RateLimitError

client = Together()
max_retries = 3

for attempt in range(max_retries):
    try:
        response = client.images.generate(
            prompt="your prompt",
            model="stabilityai/stable-diffusion-xl-base-1.0"
        )
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        else:
            raise
```

Source: [src/together/error.py:40-55](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

## Error Handling

The SDK provides specific exception types for various error conditions:

| Exception Type | Description |
|----------------|-------------|
| `TogetherException` | Base exception for all SDK errors |
| `RateLimitError` | API rate limit exceeded |
| `APIConnectionError` | Network connectivity issues |
| `Timeout` | Request timeout |

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

Example error handling:

```python
from together import Together
from together.error import TogetherException, RateLimitError, Timeout

client = Together()

try:
    response = client.images.generate(
        prompt="test image",
        model="stabilityai/stable-diffusion-xl-base-1.0"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except Timeout:
    print("Request timed out. The image may be complex - try with fewer steps.")
except TogetherException as e:
    print(f"API error: {e}")
```

## Best Practices

### 1. Optimize Image Dimensions

For faster generation, use smaller dimensions initially and upscale if needed:

```python
# Faster initial generation
response = client.images.generate(
    prompt="landscape",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    height=512,
    width=512,
    steps=20,  # Fewer steps for draft
)
```

### 2. Use Seeds for Iteration

When refining a concept, use a fixed seed to maintain consistency:

```python
base_seed = 42

# Generate variations while maintaining composition
for i in range(4):
    response = client.images.generate(
        prompt=f"landscape with {'spring' if i % 2 == 0 else 'autumn'} colors",
        model="stabilityai/stable-diffusion-xl-base-1.0",
        seed=base_seed,
    )
```

### 3. Batch Generation

Generate multiple images in a single request when possible for efficiency:

```python
response = client.images.generate(
    prompt="concept variations",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Single API call for 4 images
)
```

### 4. Handle Revised Prompts

The API may modify prompts for safety reasons. Always check for revised prompts:

```python
response = client.images.generate(
    prompt="your prompt here",
    model="stabilityai/stable-diffusion-xl-base-1.0",
)

for image_data in response.data:
    if image_data.revised_prompt and image_data.revised_prompt != prompt:
        print(f"Prompt was revised to: {image_data.revised_prompt}")
```

## See Also

- [Chat Completions](chat-completions.md) - Text generation with LLMs
- [Embeddings](embeddings.md) - Text vectorization
- [Fine-tuning](fine-tuning.md) - Custom model training
- [Together AI Documentation](https://docs.together.ai/docs/image-generation-quickstart) - Official platform documentation

---

<a id='page-files-api'></a>

## Files API

### Related Pages

Related topics: [Fine-Tuning](#page-finetuning)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/filemanager.py](https://github.com/togethercomputer/together-python/blob/main/src/together/filemanager.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
- [src/together/cli/api/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/files.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)
- [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)
</details>

# Files API

The Files API provides capabilities for uploading, managing, and validating training datasets for use with Together AI's fine-tuning services. It serves as the foundation for preparing training data that powers model customization workflows.

## Overview

The Files API enables developers to:

- **Upload training datasets** in JSONL format for fine-tuning jobs
- **Validate file content** locally before uploading to catch formatting errors early
- **Manage remote files** (list, retrieve, delete) on Together's infrastructure
- **Support multimodal content** including text and image data for vision model training

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

## Architecture

```mermaid
graph TD
    A[User Code / CLI] --> B[Files API Client]
    B --> C[FileManager]
    C --> D[Together API]
    
    E[Local Validation] --> B
    E --> F[files.py utils]
    F --> G[JSONL Parser]
    G --> H[Content Validators]
    
    I[Fine-tuning] --> D
    I --> C
    
    style D fill:#e1f5fe
    style C fill:#fff3e0
    style F fill:#f3e5f5
```

### Component Overview

| Component | File | Responsibility |
|-----------|------|-----------------|
| `Together` client | `filemanager.py` | Main API entry point |
| `FileManager` | `filemanager.py` | Handles file operations |
| `files.py` utils | `utils/files.py` | Local validation and parsing |
| CLI commands | `cli/api/files.py` | Command-line interface |

Source: [src/together/filemanager.py](https://github.com/togethercomputer/together-python/blob/main/src/together/filemanager.py)

## File Validation

The SDK provides robust local validation capabilities through the `files.py` utility module. This validation runs before uploads to catch formatting errors early, preventing failed fine-tuning jobs due to malformed data.

### Validation Rules

The validator checks multiple aspects of your JSONL files:

| Validation Rule | Description | Error Type |
|-----------------|-------------|------------|
| `content` field type | Must be a list of dicts | `InvalidFileFormatError` |
| `type` field presence | Each item must have a `type` field | `InvalidFileFormatError` |
| Text content | For `type: "text"`, must have valid `text` string | `InvalidFileFormatError` |
| Image content | For `type: "image_url"`, must have valid `image_url` dict | `InvalidFileFormatError` |
| Image size | Base64 images must be under 10MB | `InvalidFileFormatError` |
| Image limit | Maximum 10 images per example | `InvalidFileFormatError` |
| Image role | Images only allowed in user messages | `InvalidFileFormatError` |

Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

### Supported Content Types

```python
# Text content
{"type": "text", "text": "The training prompt here"}

# Image URL content
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}

# Base64 image content
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
```

### Multimodal Dataset Structure

The validator supports multimodal datasets for vision model fine-tuning:

```mermaid
graph LR
    A[JSONL Line] --> B{Parse content}
    B -->|List| C[Validate each item]
    B -->|String| D[Plain text]
    
    C --> E{type == "text"?}
    C --> F{type == "image_url"?}
    
    E -->|Yes| G[Validate text field]
    F -->|Yes| H[Validate image_url dict]
    F -->|No| I[Error: Unknown type]
    
    H --> J{URL or Base64?}
    J -->|Base64| K[Check size < 10MB]
    K --> L[Count images]
    J -->|URL| L
```

Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)

## Python Client Usage

### Initialization

```python
from together import Together

client = Together()
```

The client automatically reads the `TOGETHER_API_KEY` environment variable. You can also pass the key explicitly:

```python
client = Together(api_key="your-api-key-here")
```

### File Operations

#### Upload a File

```python
response = client.files.upload(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)
print(response.id)
```

#### List Files

```python
files = client.files.list()

for file in files.data:
    print(f"ID: {file.id}, Filename: {file.filename}, Size: {file.bytes}")
```

#### Retrieve File Metadata

```python
file_info = client.files.retrieve(file_id="file-xxxxx")
print(f"Created: {file_info.created_at}")
print(f"Filename: {file_info.filename}")
```

#### Retrieve File Content

```python
content = client.files.retrieve_content(file_id="file-xxxxx")
print(content)
```

#### Delete a File

```python
result = client.files.delete(file_id="file-xxxxx")
print(result.deleted)
```

Source: [src/together/filemanager.py](https://github.com/togethercomputer/together-python/blob/main/src/together/filemanager.py)

## CLI Usage

The `together files` command provides a command-line interface for file operations.

### Command Overview

```bash
together files --help
```

| Command | Description |
|---------|-------------|
| `together files check` | Validate a local file before uploading |
| `together files upload` | Upload a file to Together AI |
| `together files list` | List all uploaded files |
| `together files retrieve` | Get file metadata |
| `together files retrieve-content` | Download file content |
| `together files delete` | Delete a remote file |

Source: [README.md](https://github.com/togethercomputer/together-python/blob/main/README.md)

### Check File (Local Validation)

Validate your JSONL file locally before uploading:

```bash
together files check example.jsonl
```

This runs the same validation logic that the SDK uses, checking:
- JSONL format validity
- Content structure
- Multimodal content rules
- Image size limits

### Upload a File

```bash
together files upload example.jsonl
```

### List Files

```bash
together files list
```

### Retrieve File Metadata

```bash
together files retrieve file-6f50f9d1-5b95-416c-9040-0799b2b4b894
```

### Retrieve File Content

```bash
together files retrieve-content file-6f50f9d1-5b95-416c-9040-0799b2b4b894
```

### Delete a Remote File

```bash
together files delete file-6f50f9d1-5b95-416c-9040-0799b2b4b894
```

## Data Flow for Fine-tuning

The Files API integrates directly with the Fine-tuning API. Here's how files flow through the system:

```mermaid
sequenceDiagram
    participant User
    participant CLI as Files CLI
    participant SDK as Python SDK
    participant API as Together API
    participant FT as Fine-tuning |
    
    User->>CLI: together files upload data.jsonl
    CLI->>SDK: client.files.upload()
    SDK->>SDK: Validate locally
    SDK->>API: POST /v1/files
    API-->>SDK: {id: "file-xxxxx"}
    SDK-->>CLI: File upload response
    
    User->>CLI: together fine-tuning create
    CLI->>SDK: client.fine_tuning.create(training_file="file-xxxxx")
    SDK->>API: POST /v1/fine_tuning/jobs
    API-->>SDK: {id: "ft-xxxxx"}
    SDK-->>CLI: Fine-tuning job response
```

## Error Handling

The SDK defines several exception types for file-related errors:

| Exception | Use Case |
|-----------|----------|
| `TogetherException` | Base exception class |
| `FileTypeError` | Invalid file type or format |
| `APIConnectionError` | Network connectivity issues |
| `Timeout` | Request timeout |

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

### Handling Upload Errors

```python
from together import Together
from together.error import FileTypeError, APIConnectionError

client = Together()

try:
    response = client.files.upload(
        file=open("data.jsonl", "rb"),
        purpose="fine-tune"
    )
except FileTypeError as e:
    print(f"Invalid file format: {e}")
except APIConnectionError as e:
    print(f"Connection error: {e}")
```

## Common Issues

### File Format Validation Failures

The local validation (`together files check`) should be run before uploading. This catches the most common issues:

1. **Missing `type` field**: Every content item must have a `type` field
2. **Invalid `type` value**: Must be either `"text"` or `"image_url"`
3. **Missing `text` field**: Text items must have a `text` string field
4. **Image in non-user message**: Images are only allowed in user roles
5. **Base64 size exceeded**: Images must be under 10MB when base64-encoded

### Fine-tuning Integration

Files uploaded via the Files API can be used in fine-tuning jobs:

```python
from together import Together

client = Together()

# Upload training file
training_file = client.files.upload(
    file=open("train.jsonl", "rb"),
    purpose="fine-tune"
)

# Create fine-tuning job with uploaded file
job = client.fine_tuning.create(
    training_file=training_file.id,
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct"
)
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)

## Configuration Options

### File Upload Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file` | file-like object | Yes | File to upload |
| `purpose` | string | Yes | Intended use (e.g., `"fine-tune"`) |

### File Check Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | Yes | Path to local file |

## Best Practices

1. **Always validate locally first**: Run `together files check` before uploading to catch format errors early
2. **Use descriptive filenames**: Makes files easier to identify in the file list
3. **Check file size**: Large files may take longer to upload and process
4. **Verify JSONL format**: Ensure each line is valid JSON
5. **Test with small dataset first**: Validate your pipeline with a subset before full upload

## See Also

- [Fine-tuning Guide](./Fine-tuning.md) - Complete fine-tuning workflow using uploaded files
- [Chat Completions](./Chat-Completions.md) - Using models after fine-tuning
- [CLI Reference](./CLI-Reference.md) - Complete CLI documentation

---

<a id='page-finetuning'></a>

## Fine-Tuning

### Related Pages

Related topics: [Files API](#page-files-api)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)
- [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py)
- [src/together/legacy/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/legacy/finetune.py)
- [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py)
- [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)
- [src/together/types/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/types/finetune.py)
</details>

# Fine-Tuning

The Fine-Tuning module in the Together Python SDK provides a comprehensive interface for customizing foundation models on the Together Inference API. This module enables developers to adapt pre-trained models to their specific use cases through supervised fine-tuning, LoRA (Low-Rank Adaptation), and advanced alignment methods like DPO (Direct Preference Optimization).

## Overview

Fine-tuning transforms a pre-trained model into a specialized tool tailored for specific tasks, domains, or behaviors. The Together platform supports multiple fine-tuning methodologies:

| Training Method | Description | Use Case |
|----------------|-------------|----------|
| **Full Training** | Updates all model weights | Maximum customization, larger datasets |
| **LoRA** | Low-Rank Adaptation with adapter weights | Efficient fine-tuning, lower compute costs |
| **DPO** | Direct Preference Optimization | Alignment and preference learning |
| **RPO** | Relative Preference Optimization | Alternative alignment approach |
| **SimPO** | Simple Preference Optimization | Simplified alignment without reference model |

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py)

## Architecture

The fine-tuning system follows a layered architecture with the `FineTuning` class serving as the primary interface:

```mermaid
graph TD
    A[User Application] --> B[Together Client]
    B --> C[FineTuning Class]
    C --> D[APIRequestor]
    D --> E[Together Inference API]
    
    F[CLI Commands] --> C
    G[Legacy API] --> C
    
    H[File Validation] --> C
    I[Checkpoint Management] --> C
    J[Price Estimation] --> C
```

### Core Components

| Component | Location | Purpose |
|-----------|----------|---------|
| `FineTuning` | `resources/finetune.py` | Main API interface for fine-tuning operations |
| `FineTuneCreateRequest` | `types/finetune.py` | Request payload model for job creation |
| CLI Commands | `cli/api/finetune.py` | Command-line interface for fine-tuning |
| Legacy API | `legacy/finetune.py` | Backward-compatible wrapper functions |
| File Validation | `utils/files.py` | Dataset file format validation |

## Creating Fine-Tuning Jobs

### Python Client

The `FineTuning.create()` method initiates a new fine-tuning job. The method accepts numerous parameters to customize the training process:

```python
from together import Together

client = Together()

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    validation_file="file-def456",
    n_epochs=3,
    batch_size=4,
    learning_rate=1e-5,
    suffix="my-custom-model",
    wandb_api_key="your-wandb-key",
    wandb_project_name="my-project",
)
print(response)
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L100-L150)

### Supported Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | Required | Base model identifier (e.g., `meta-llama/Llama-3-8b-hf`) |
| `training_file` | `str` | Required | Uploaded training file ID |
| `validation_file` | `str` | Optional | Uploaded validation file ID |
| `n_epochs` | `int` | `3` | Number of training epochs |
| `n_checkpoints` | `int` | `1` | Number of checkpoints to save |
| `batch_size` | `int` | Auto | Training batch size |
| `learning_rate` | `float` | `1e-5` | Initial learning rate |
| `lr_scheduler_type` | `str` | `cosine` | Learning rate scheduler |
| `warmup_ratio` | `float` | `0.1` | Warmup ratio for learning rate |
| `weight_decay` | `float` | `0.01` | Weight decay coefficient |
| `max_grad_norm` | `float` | `1.0` | Maximum gradient norm |
| `suffix` | `str` | `None` | Custom suffix for output model name |
| `lora` | `bool` | `False` | Enable LoRA fine-tuning |
| `lora_r` | `int` | `8` | LoRA attention dimension |
| `lora_dropout` | `float` | `0.05` | LoRA dropout probability |
| `lora_alpha` | `int` | `16` | LoRA alpha parameter |
| `train_on_inputs` | `bool` | `None` | Mask user messages in training |
| `train_vision` | `bool` | `False` | Train vision encoder (multimodal models) |
| `training_method` | `str` | `sft` | Training method (dpo, rpo, simpo) |
| `from_checkpoint` | `str` | `None` | Resume from previous job checkpoint |
| `from_hf_model` | `str` | `None` | HuggingFace model to continue training from |

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L80-L120)

### Async Support

For asynchronous applications, use `AsyncTogether` with the async `FineTuning` methods:

```python
import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def create_ft_job():
    response = await async_client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
        n_epochs=3,
    )
    return response

result = asyncio.run(create_ft_job())
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L200-L250)

## Managing Fine-Tuning Jobs

### Job Lifecycle

```mermaid
stateDiagram-v2
    [*] --> Created: create()
    Created --> Queued: Submitted
    Queued --> Running: Started
    Running --> Completed: Success
    Running --> Failed: Error
    Completed --> Cancelled: cancel()
    Queued --> Cancelled: cancel()
```

### Listing Jobs

Retrieve all fine-tuning jobs associated with your account:

```python
response = client.fine_tuning.list()
for job in response.data:
    print(f"ID: {job.id}, Model: {job.model}, Status: {job.status}")
```

### Retrieving Job Details

Get detailed information about a specific fine-tuning job:

```python
job = client.fine_tuning.retrieve(id="ft-job-abc123")
print(f"Status: {job.status}")
print(f"Training steps: {job.training_steps}")
print(f"Output model: {job.output_name}")
```

### Cancelling Jobs

Abort a running or queued fine-tuning job:

```python
result = client.fine_tuning.cancel(id="ft-job-abc123")
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L300-L400)

## Checkpoint Management

Checkpoints enable resuming training from intermediate states and retrieving model weights for deployment.

### Retrieving Checkpoints

```python
checkpoints = client.fine_tuning.checkpoints(id="ft-job-abc123")
for checkpoint in checkpoints.data:
    print(f"Step: {checkpoint.step}, Type: {checkpoint.checkpoint_type}")
```

The `_parse_raw_checkpoints()` helper processes raw checkpoint metadata:

```python
parsed_checkpoints = []
for checkpoint in checkpoints:
    step = checkpoint["step"]
    checkpoint_type = checkpoint["checkpoint_type"]
    checkpoint_name = (
        f"{id}:{step}" if "intermediate" in checkpoint_type.lower() else id
    )
    parsed_checkpoints.append(
        FinetuneCheckpoint(
            type=checkpoint_type,
            timestamp=checkpoint["created_at"],
            name=checkpoint_name,
        )
    )
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L150-L180)

### Download Checkpoints

Download fine-tuned model weights using the CLI:

```bash
# Download latest checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Download specific checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-step 1000

# Download with specific checkpoint type
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-type merged
```

#### Checkpoint Types

| Type | Description | Applicable Training |
|------|-------------|---------------------|
| `default` | Default output format | All |
| `merged` | Merged with base model (LoRA only) | LoRA |
| `adapter` | Adapter weights only (LoRA only) | LoRA |
| `model_output_path` | Full model output (Full only) | Full |

Source: [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py#L50-L80)

### Download Options

| CLI Option | Description |
|------------|-------------|
| `--output_dir`, `-o` | Output directory for downloaded files |
| `--checkpoint-step`, `-s` | Specific checkpoint step to download |
| `--checkpoint-type` | Checkpoint type (default, merged, adapter) |

```python
result = client.fine_tuning.download(
    fine_tune_id="ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b",
    output="./model-output",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED,
)
print(f"Downloaded to: {result.filename}")
```

## CLI Commands

The Together CLI provides a comprehensive set of commands for fine-tuning operations:

### Create a Fine-Tuning Job

```bash
together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --n-epochs 3 \
    --suffix my-custom-model
```

### List Fine-Tuning Jobs

```bash
together fine-tuning list
```

### Retrieve Job Details

```bash
together fine-tuning retrieve ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
```

### Cancel a Job

```bash
# With confirmation prompt
together fine-tuning cancel ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Force deletion without confirmation
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --force
```

### Delete a Job

```bash
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
```

Source: [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py#L1-L150)

## Weights & Biases Integration

The SDK supports automatic logging to Weights & Biases for experiment tracking:

```bash
together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --wandb-api-key your-api-key \
    --wandb-project-name my-project \
    --wandb-name my-experiment-run
```

| Parameter | Description |
|-----------|-------------|
| `--wandb-api-key` | Weights & Biases API key |
| `--wandb-project-name` | W&B project name |
| `--wandb-name` | W&B run name |
| `--wandb-base-url` | W&B base URL (for enterprise deployments) |

Source: [src/together/cli/api/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/cli/api/finetune.py#L100-L140)

## File Format Requirements

Training and validation files must follow specific JSONL (JSON Lines) format requirements:

### Instruction Tuning Format

```json
{"text": "What is the capital of France?\nAnswer: Paris"}
```

### Chat/Conversation Format

```json
{"content": [{"type": "text", "text": "What is the capital of France?"}], "role": "user"}
{"content": [{"type": "text", "text": "Paris"}], "role": "assistant"}
```

### Multimodal Format (with Images)

```json
{"content": [{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}], "role": "user"}
```

### Validation Rules

The file validation system enforces the following rules:

| Rule | Error | Source |
|------|-------|--------|
| File must be valid JSONL | `InvalidFileFormatError` | `utils/files.py` |
| Content must be a list of dicts | `InvalidFileFormatError` | `utils/files.py` |
| Each item must have `type` field | `InvalidFileFormatError` | `utils/files.py` |
| Text items must have `text` field (string) | `InvalidFileFormatError` | `utils/files.py` |
| Image items must be in user messages only | `InvalidFileFormatError` | `utils/files.py` |
| Image items must have `image_url` dict | `InvalidFileFormatError` | `utils/files.py` |

Source: [src/together/utils/files.py](https://github.com/togethercomputer/together-python/blob/main/src/together/utils/files.py#L20-L60)

## Error Handling

The fine-tuning module defines specific exception types for different failure scenarios:

### Exception Types

| Exception | Use Case |
|-----------|----------|
| `TogetherException` | Base exception class |
| `RateLimitError` | API rate limit exceeded |
| `FileTypeError` | Invalid file format |
| `APIConnectionError` | Network connectivity issues |
| `Timeout` | Request timeout |

Source: [src/together/error.py](https://github.com/togethercomputer/together-python/blob/main/src/together/error.py)

### Error Response Model

```python
from together.types.error import TogetherErrorResponse

error_response = TogetherErrorResponse(
    message="Invalid training file format",
    type="validation_error",
    param="training_file",
    code="INVALID_FORMAT"
)
```

### Handling Errors

```python
from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"Fine-tuning error: {e}")
```

## Legacy API

The SDK provides backward-compatible wrapper functions in the legacy module:

```python
from together.legacy import finetune

# These functions are deprecated but still functional
response = finetune.create(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
)
```

> ⚠️ **Warning**: The legacy functions emit deprecation warnings. Migrate to the new `client.fine_tuning` interface for new projects.

Source: [src/together/legacy/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/legacy/finetune.py)

## Common Patterns

### Resuming from Checkpoint

Continue training from a previous fine-tuning job:

```python
response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_checkpoint="ft-previous-job:1000",  # Resume from step 1000
)
```

### Fine-tuning from HuggingFace Model

Start training from a HuggingFace Hub model:

```python
response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_hf_model="username/my-finetuned-model",
    hf_model_revision="v1.0",
)
```

### Training with Price Limits

The SDK includes price estimation to prevent unexpected costs:

```python
price_estimation = client.fine_tuning.estimate_price(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
    training_type="lora",
)

if price_estimation.allowed_to_proceed:
    response = client.fine_tuning.create(...)
else:
    print(f"Estimated cost ${price_estimation.estimated_cost} exceeds limit")
```

Source: [src/together/resources/finetune.py](https://github.com/togethercomputer/together-python/blob/main/src/together/resources/finetune.py#L120-L140)

## Price Estimation

The price estimation feature helps users understand the expected cost before starting a fine-tuning job:

```mermaid
graph LR
    A[User Creates Job] --> B{from_checkpoint or from_hf_model?}
    B -->|No| C[Estimate Price]
    B -->|Yes| D[Skip Estimation]
    C --> E{Cost within limits?}
    E -->|Yes| F[Submit Job]
    E -->|No| G[Show Warning]
    D --> F
```

Price estimation is automatically performed when creating jobs without a checkpoint or HuggingFace model source, unless explicitly disabled.

## See Also

- [Chat Completions](Chat-Completions) - Using fine-tuned models for inference
- [Files](Files) - Uploading training and validation datasets
- [Models](Models) - Available base models for fine-tuning
- [Getting Started](Getting-Started) - SDK installation and authentication
- [CLI Reference](CLI-Reference) - Complete CLI command documentation

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

# Pitfall Log

Project: togethercomputer/together-python

Summary: 发现 12 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：能力坑 - 能力判断依赖假设。

## 1. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:624113979 | https://github.com/togethercomputer/together-python | README/documentation is current enough for a first validation pass.

## 2. 运行坑 · 来源证据：v.1.5.31

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：v.1.5.31
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ec69ff431b15448799cc64a826efd011 | https://github.com/togethercomputer/together-python/releases/tag/v.1.5.31 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 3. 运行坑 · 来源证据：v.1.5.33

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：v.1.5.33
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_8246897bffc548df9673fe0c390cb514 | https://github.com/togethercomputer/together-python/releases/tag/v.1.5.33 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 4. 运行坑 · 来源证据：v1.5.28

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：v1.5.28
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_7e7977d7006e43939b5ceb03d0efad33 | https://github.com/togethercomputer/together-python/releases/tag/v1.5.28 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 5. 维护坑 · 来源证据：v.1.5.29

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：v.1.5.29
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_e91b0421f8fe42ac815a709d90dcca27 | https://github.com/togethercomputer/together-python/releases/tag/v.1.5.29 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 6. 维护坑 · 来源证据：v1.5.27

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：v1.5.27
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_569e5bacb3584ae794141c49af91c5db | https://github.com/togethercomputer/together-python/releases/tag/v1.5.27 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 7. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | last_activity_observed missing

## 8. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

## 9. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

## 10. 安全/权限坑 · 来源证据：`LogProbs.top_logprobs` typed as `Dict` but API returns `List[Dict]`

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：`LogProbs.top_logprobs` typed as `Dict` but API returns `List[Dict]`
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_4b91e5a8164f4aa9910f3d9737f1995c | https://github.com/togethercomputer/together-python/issues/443 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 11. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | issue_or_pr_quality=unknown

## 12. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | release_recency=unknown

<!-- canonical_name: togethercomputer/together-python; human_manual_source: deepwiki_human_wiki -->