together-python Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

together-python

Related topics: Installation and Setup, Client Architecture

Overview

Related topics: Installation and Setup, Client Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, Client Architecture

Overview

The together-python repository is an official Python SDK and Command Line Interface (CLI) for interacting with the Together AI API. It provides developers with programmatic access to a wide range of large language models (LLMs), image generation models, embedding services, and fine-tuning capabilities hosted on the Together platform.

Source: README.md

Source: https://github.com/togethercomputer/together-python / Human Manual

Installation and Setup

This page covers the complete installation and setup process for the Together Python SDK (together-python), including prerequisites, configuration options, CLI setup, and development environment configuration.

Overview

The together-python SDK provides a Python interface and command-line tool for interacting with the Together AI API. It enables developers to:

Access chat completions with support for multimodal inputs (text and images)
Generate text completions
Create and manage fine-tuning jobs
Generate images
Compute embeddings and reranking
Manage files and model resources

Source: README.md

Prerequisites

Python Version Requirements

The SDK requires Python 3.10 or higher. The project uses modern Python features including type hints and async/await patterns.

API Key

A valid Together AI API key is required for all API operations. You can obtain an API key by:

Creating an account at api.together.ai
Navigating to the API keys settings page

Source: README.md

Installation Methods

Standard Installation (pip)

Install the latest stable release from PyPI:

pip install together

Poetry Installation

For projects using Poetry as the dependency manager:

poetry add together

Source: CONTRIBUTING.md

Development Installation

For contributors who want to modify the source code or run tests locally:

# Clone the repository
git clone https://github.com/togethercomputer/together-python.git
cd together-python

# Install with development dependencies
poetry install --with quality,tests

Source: CONTRIBUTING.md

Configuration

Environment Variables

The SDK supports configuration through environment variables. The primary variable required is:

Environment Variable	Description	Required
`TOGETHER_API_KEY`	Your Together AI API key	Yes

#### Setting the API Key

Unix/Linux/macOS:

export TOGETHER_API_KEY=xxxxx

Windows (Command Prompt):

set TOGETHER_API_KEY=xxxxx

Windows (PowerShell):

$env:TOGETHER_API_KEY="xxxxx"

Source: README.md

Client Configuration

The Python client can be initialized with or without an explicit API key:

Using environment variable (recommended):

from together import Together

client = Together()  # Automatically reads TOGETHER_API_KEY

Explicit API key:

from together import Together

client = Together(api_key="your-api-key-here")

Source: README.md

Optional Dependencies

The SDK uses Poetry for dependency management. Some features require optional dependencies:

Extra	Purpose	Install Command
`extended_testing`	Additional testing dependencies	`poetry install --with extended_testing`

When adding new dependencies, maintainers follow a strict policy: dependencies should be optional and users who don't have them installed should be able to import the SDK without warnings or errors.

Source: CONTRIBUTING.md

CLI Setup

Installation

The CLI is included with the main package installation. After installing together, the together command becomes available.

Verification

Verify the CLI installation:

together --help

Common CLI Commands

Command	Description
`together chat`	Chat completions
`together completions`	Text completions
`together images`	Image generation
`together files`	File management
`together fine-tuning`	Fine-tuning operations
`together models`	List and manage models

Source: README.md

Client Initialization Patterns

Synchronous Client

from together import Together

client = Together()

Asynchronous Client

from together import AsyncTogether

async_client = AsyncTogether()

Basic Usage Flow

graph TD
    A[Install together package] --> B[Set TOGETHER_API_KEY]
    B --> C[Import Together or AsyncTogether]
    C --> D[Initialize client]
    D --> E[Call API methods]
    E --> F[Process response]

SDK Constants

The SDK defines several constants in src/together/constants.py:

Constant	Purpose
API base URLs	Endpoint configurations
Default timeouts	Request timeout values
Version information	SDK version tracking

Source: src/together/constants.py

Error Handling Setup

The SDK provides a comprehensive error hierarchy for handling API-related issues:

Exception Types

Exception Class	Purpose
`TogetherException`	Base exception class
`RateLimitError`	Handle rate limiting
`FileTypeError`	File format validation errors
`APIConnectionError`	Network connectivity issues
`Timeout`	Request timeout handling
`AuthenticationError`	Invalid API key errors

Source: src/together/error.py

Error Response Format

API errors are returned with structured information:

class TogetherErrorResponse(BaseModel):
    message: str | None = None      # Error message
    type: str | None = None         # Error type
    param: str | None = None        # Parameter causing error
    code: str | None = None         # Error code

Source: src/together/types/error.py

Error Handling Example

from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"API error: {e}")

Known Compatibility Issues

Typer Version Conflict

Note: The SDK has a dependency constraint on typer<0.16.0. If your project requires typer>=0.16.0, you may encounter dependency conflicts. See Issue #348 for tracking.

This is a known community issue where projects depending on newer typer versions cannot use together-python without resolving the conflict.

Pillow Version

Note: The SDK's image processing may have transitive dependency issues with pillow>=11.0.0 when used alongside libraries like autogen 0.4.2. See Issue #237 for details.

Development Environment Setup

1. Install Poetry

Follow the official Poetry installation guide.

Important: If you use Conda or Pyenv, create and activate a new environment before installing Poetry:

```bash

conda create -n together python=3.10

conda activate together

```

2. Configure Poetry

Tell Poetry to use the active Python environment:

poetry config virtualenvs.prefer-active-python true

3. Install Dependencies

poetry install --with quality,tests

4. Set Up Pre-commit Hooks

The project uses pre-commit for auto-formatting and linting:

pre-commit install

Source: CONTRIBUTING.md

Running Tests

#### Unit Tests

make tests

#### Integration Tests

Warning: Integration tests require an active API key and will incur charges.

make integration_tests

Source: CONTRIBUTING.md

Formatting and Linting

Before submitting changes, run formatting locally:

make format

The CI system automatically checks formatting, linting, and tests.

Source: CONTRIBUTING.md

Quick Start Checklist

Step	Task	Command/Action
1	Check Python version	`python --version` (requires 3.10+)
2	Install SDK	`pip install together`
3	Set API key	`export TOGETHER_API_KEY=xxxxx`
4	Verify installation	`python -c "from together import Together; print('OK')"`
5	Test basic call	Run a simple chat completion

Client Architecture

Related topics: Chat Completions, Type System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Together Client Class

Continue reading this section for the full explanation and source context.

Section API Requestor

Continue reading this section for the full explanation and source context.

Section Resource Modules

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Type System

Client Architecture

Overview

The Together Python SDK provides a unified interface for interacting with the Together AI platform through both a programmatic Python client and a command-line interface (CLI). The client architecture follows a layered design pattern that separates concerns between API communication, resource management, and user-facing interfaces.

The architecture is designed to support multiple API capabilities including chat completions, text completions, embeddings, image generation, file management, and fine-tuning operations. Source: src/together/resources/__init__.py

Core Components

The SDK architecture consists of three primary layers that work together to provide a seamless developer experience:

graph TD
    A[User Application] --> B[Together Client]
    B --> C[Resource Modules]
    C --> D[API Requestor]
    D --> E[Together AI API]
    E --> D
    D --> B
    B --> A
    
    F[CLI Commands] --> B
    
    subgraph Resources
        G[Chat Completions]
        H[Completions]
        I[Embeddings]
        J[Images]
        K[Files]
        L[Fine-tuning]
    end
    
    C --> G
    C --> H
    C --> I
    C --> J
    C --> K
    C --> L

Together Client Class

The Together class serves as the main entry point for the SDK. It provides a synchronous interface for all API operations and manages the underlying HTTP client configuration.

Key Responsibilities:

Initialization and configuration of API credentials
Delegation of requests to appropriate resource modules
Streaming response handling
Timeout and connection management

Source: src/together/client.py

Basic Initialization:

from together import Together

# Using environment variable (TOGETHER_API_KEY)
client = Together()

# Explicit API key
client = Together(api_key="your-api-key-here")

API Requestor

The APIRequestor class handles the low-level communication with the Together AI API. It abstracts away HTTP details and provides a consistent interface for both synchronous and asynchronous operations.

Requestor Responsibilities:

Constructing HTTP requests with proper authentication headers
Handling request serialization and response parsing
Managing streaming responses
Implementing retry logic for transient failures
Processing error responses into typed exceptions

Source: src/together/abstract/api_requestor.py

Resource Modules

Resource modules encapsulate API operations by domain. Each resource module provides type-safe methods for a specific category of API endpoints.

Resource Module	Purpose	Key Methods
`chat.completions`	Chat-based language model interactions	`create()`, streaming variants
`completions`	Text completion operations	`create()`, streaming variants
`embeddings`	Text embedding generation	`create()`
`images`	Image generation	`generate()`
`files`	File upload, retrieval, and management	`upload()`, `retrieve()`, `list()`, `delete()`
`fine_tuning`	Model fine-tuning operations	`create()`, `retrieve()`, `list()`, `cancel()`, `download()`

Source: src/together/resources/__init__.py

Request/Response Flow

Synchronous Request Flow

sequenceDiagram
    participant App as Application Code
    participant Client as Together Client
    participant Resource as Resource Module
    participant Requestor as API Requestor
    participant API as Together AI API

    App->>Client: client.chat.completions.create(...)
    Client->>Resource: delegating request
    Resource->>Resource: build request parameters
    Resource->>Requestor: request()
    Requestor->>API: POST /chat/completions
    API-->>Requestor: JSON Response
    Requestor->>Resource: parse response
    Resource-->>Client: typed response object
    Client-->>App: ChatCompletionResponse

Streaming Response Handling

The SDK supports server-sent events (SSE) streaming for real-time token delivery. Streaming is handled differently depending on the API endpoint:

Chat Completions Streaming:

from together import Together

client = Together()
stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Source: src/together/resources/chat/completions.py

The streaming implementation yields ChatCompletionChunk objects asynchronously when iterating over the response stream.

Asynchronous Support

The SDK provides AsyncTogether for applications requiring concurrent API operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def concurrent_requests():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Prompt {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    return responses

Error Handling

The SDK defines a hierarchy of exception types for different error conditions, enabling precise error handling in application code.

Exception Hierarchy

TogetherException (base)
├── RateLimitError
├── FileTypeError
├── AttributeError
├── Timeout
├── APIConnectionError

Source: src/together/error.py

Error Response Model

API error responses are parsed into structured TogetherErrorResponse objects:

Field	Type	Description
`message`	`str \	None`	Human-readable error message
`type`	`str \	None`	Error category/type
`param`	`str \	None`	Parameter that caused the error
`code`	`str \	None`	Machine-readable error code

Source: src/together/types/error.py

Error Handling Example

from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
except TogetherException as e:
    print(f"API error: {e.message}")

File Validation Architecture

The SDK includes robust file validation for fine-tuning datasets, ensuring data integrity before upload.

graph LR
    A[Input File] --> B{File Type Check}
    B -->|JSONL| C[JSONL Validator]
    B -->|JSON| D[JSON Validator]
    C --> E{Content Validation}
    D --> E
    E --> F[Schema Validation]
    F --> G[Size Limits Check]
    G --> H[Upload Ready]
    E -->|Invalid| I[InvalidFileFormatError]

Validation Rules

The file validation system enforces the following constraints:

Rule	Limit	Description
Maximum base64 image size	10MB	Per image in multimodal datasets
Maximum images per example	5	Images allowed in a single training example
Required fields	`type`, `content`	For each message in multimodal format

Source: src/together/utils/files.py

Supported Content Types

Type	Description	Role Restrictions
`text`	Plain text content	Any role
`image_url`	Base64-encoded image	User role only

Fine-tuning Architecture

The fine-tuning module provides comprehensive support for training custom models on the Together platform.

Training Methods

The SDK supports multiple fine-tuning methodologies:

Method	Description	Checkpoint Types
Full training	Updates all model weights	Default only
LoRA	Low-rank adaptation	Default, Merged, Adapter
DPO	Direct Preference Optimization	Default
SimPO	Simple Preference Optimization	Default
RPO	Reward Preference Optimization	Default

Source: src/together/resources/finetune.py

Checkpoint Management

The fine-tuning resource handles checkpoint retrieval and download:

# List available checkpoints
checkpoints = client.fine_tuning.retrieve_checkpoints(fine_tune_id)

# Download specific checkpoint
result = client.fine_tuning.download(
    fine_tune_id,
    output="./checkpoints",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED
)

CLI Architecture

The command-line interface is built using Click and mirrors the Python client functionality.

CLI Command Structure

together
├── chat completions
├── completions
├── embeddings
├── files
│   ├── check
│   ├── upload
│   ├── list
│   ├── retrieve
│   └── delete
├── fine-tuning
│   ├── create
│   ├── list
│   ├── retrieve
│   ├── cancel
│   ├── download
│   └── delete
└── models
    ├── list
    └── start

Source: src/together/cli/api/chat.py and src/together/cli/api/finetune.py

CLI Configuration

The CLI supports environment variable configuration:

# Set API key
export TOGETHER_API_KEY=your-api-key

# Use CLI
together chat completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, world!"

Known Issues and Limitations

Dependency Compatibility

Issue #348: The SDK has a dependency constraint on typer<0.16.0, which may conflict with projects requiring newer versions of typer. This can cause dependency resolution failures in environments where multiple packages have conflicting typer requirements.

Issue #237: The pillow dependency version may conflict with transitive dependencies from other packages like autogen>=0.4.2 that require pillow>=11.0.0.

Model Type Validation

Issue #337: The ModelObject type definition may not include all valid model types, potentially causing Pydantic validation errors when working with newer or specialized model types like transcription models.

Tool Response Handling

Issue #113: Multi-turn function calling workflows may encounter validation errors when processing tool response messages with role='tool'. Applications implementing function calling should ensure proper message formatting according to the Together AI API specification.

Best Practices

Connection Management

Reuse the Together client instance across multiple requests to benefit from connection pooling
Set appropriate timeout values for long-running operations like fine-tuning

Error Recovery

Implement exponential backoff for RateLimitError handling
Validate file contents locally before upload to avoid wasted API calls

Streaming Performance

Process streaming chunks incrementally rather than buffering entire responses
Use async variants (AsyncTogether) for applications making multiple concurrent requests

Type System

The together-python SDK employs a comprehensive type system built on Pydantic for data validation, serialization, and API interaction. This document provides a detailed reference for developers working with the SDK's type definitions, error handling, and validation patterns.

Overview

The type system serves three primary purposes within the together-python SDK:

Data Validation: Ensures API request parameters meet expected formats before transmission
Serialization: Converts Python objects to JSON for API communication and deserializes responses
IDE Support: Provides type hints for better developer experience and autocomplete

graph TD
    A[User Code] --> B[Pydantic Models]
    B --> C{Validation}
    C -->|Pass| D[API Request]
    C -->|Fail| E[Validation Error]
    D --> F[API Response]
    F --> G[Response Models]
    G --> H[User Code]

Base Types

Abstract Base Model

All SDK types inherit from BaseModel, which extends Pydantic's BaseModel with custom configuration:

# Source: src/together/types/abstract.py
class BaseModel(BaseModel):
    """Base model for all Together API types."""
    
    model_config = ConfigDict(
        populate_by_name=True,
        validate_default=True,
        arbitrary_types_allowed=True,
    )

The BaseModel configures:

populate_by_name=True: Allows population by field name or alias
validate_default=True: Validates default values during initialization
arbitrary_types_allowed=True: Permits custom type annotations

Error Response Model

The TogetherErrorResponse type defines the structure for API error responses:

Field	Type	Description
`message`	`str \	None`	Human-readable error message
`type`	`str \	None`	Error category/type
`param`	`str \	None`	Parameter that caused the error
`code`	`str \	None`	Machine-readable error code

# Source: src/together/types/error.py
class TogetherErrorResponse(BaseModel):
    message: str | None = None
    type_: str | None = Field(None, alias="type")
    param: str | None = None
    code: str | None = None

Exception Hierarchy

The SDK defines a hierarchical exception system for granular error handling:

graph TD
    A[TogetherException<br/>Base Exception] --> B[RateLimitError]
    A --> C[FileTypeError]
    A --> D[AttributeError]
    A --> E[Timeout]
    A --> F[APIConnectionError]
    A --> G[InvalidRequestError]
    A --> H[AuthenticationError]
    A --> I[APIResponseError]

Exception Types

Exception Class	Purpose	Common Cause
`TogetherException`	Base exception for all SDK errors	General failures
`RateLimitError`	API rate limit exceeded	Too many requests
`FileTypeError`	Invalid file type submitted	Unsupported file format
`AttributeError`	Invalid attribute access	Missing or invalid parameter
`Timeout`	Request timeout	Slow network or API
`APIConnectionError`	Network connectivity issue	Connection failure
`InvalidRequestError`	Malformed request	Invalid parameters
`AuthenticationError`	Authentication failure	Invalid API key
`APIResponseError`	Unexpected API response	Server-side error

# Source: src/together/error.py
class RateLimitError(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)

Exception Construction Pattern

All exception types accept flexible message parameters:

# Source: src/together/error.py
class Timeout(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)

The message can be:

TogetherErrorResponse: Parsed API error response
Exception: Wrapped exception
str: Direct error message
RequestException: HTTP request exception

Request and Response Types

Chat Completions Types

The chat completions system uses structured types for requests and responses:

# Source: src/together/resources/chat/completions.py
response, _, _ = await requestor.arequest(
    options=TogetherRequest(
        method="POST",
        url="chat/completions",
        params=parameter_payload,
    ),
    stream=stream,
)

if stream:
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)

Streaming Response Types

Streaming responses yield ChatCompletionChunk objects:

Field	Type	Description
`choices`	`List[Choice]`	Generated completions
`model`	`str`	Model identifier
`id`	`str`	Request identifier
`usage`	`Usage`	Token usage statistics

File Validation Types

Content Item Types

The SDK validates file content for fine-tuning datasets:

# Source: src/together/utils/files.py
if item["type"] == "text":
    if "text" not in item or not isinstance(item["text"], str):
        raise InvalidFileFormatError(
            "The dataset is malformed, the `text` field must be present in the `content` item field and be"
            f" a string. Got '{item.get('text')!r}' instead.",
            line_number=idx + 1,
            error_source="key_value",
        )
elif item["type"] == "image_url":
    if role != "user":
        raise InvalidFileFormatError(
            "The dataset is malformed, only user messages can contain images.",
            line_number=idx + 1,
            error_source="key_value",
        )

Content Type Enumeration

Type	Valid Context	Description
`text`	Any role	Plain text content
`image_url`	User role only	Image URL reference

Common Issues and Troubleshooting

Validation Errors

Pydantic validation errors occur when request data doesn't match expected types:

pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelObject
type
  Input should be 'chat', 'language', 'code', 'image', 'embedding',...

Resolution: Ensure model names are valid and match available models in the Together ecosystem. Use client.models.list() to verify available models.

Invalid File Format Errors

When uploading fine-tuning datasets, content validation enforces strict rules:

# Source: src/together/utils/files.py
if not isinstance(item, dict):
    raise InvalidFileFormatError(
        "The dataset is malformed, the `content` field must be a list of dicts.",
        line_number=idx + 1,
        error_source="key_value",
    )

Type Mismatch in Streaming

When processing streaming responses, type assertions ensure correct handling:

# Source: src/together/cli/api/completions.py
if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices

Type Annotations in CLI

The CLI uses Click decorators with type annotations for command-line argument validation:

# Source: src/together/cli/api/chat.py
@click.option(
    "--max-tokens",
    type=int,
    help="Max tokens to generate"
)
@click.option(
    "--temperature",
    type=float,
    help="Sampling temperature"
)
@click.option(
    "--stop",
    type=str,
    multiple=True,
    help="List of strings to stop generation"
)

CLI Type Conversion

CLI Option Type	Python Type	Notes
`type=int`	`int`	Integer values
`type=float`	`float`	Decimal values
`type=str`	`str`	String values
`multiple=True`	`tuple`	Multiple values
`is_flag=True`	`bool`	Boolean flags

Async Type Handling

The SDK provides async variants of response types:

# Source: src/together/resources/chat/completions.py
if stream:
    assert not isinstance(response, TogetherResponse)
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)

Best Practices

Type Safety Guidelines

Use Response Models: Always use SDK response models instead of raw dictionaries
Validate Early: Check input types before API calls
Handle Exceptions: Catch specific exception types for targeted error handling
Use Type Hints: Enable IDE autocomplete with proper imports

Importing Types

from together.types.error import TogetherErrorResponse
from together.error import (
    TogetherException,
    RateLimitError,
    InvalidRequestError,
    Timeout,
)

Chat Completions

Related topics: Completions API, Client Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Environment Configuration

Continue reading this section for the full explanation and source context.

Section Client Initialization

Continue reading this section for the full explanation and source context.

Section Method Signature

Continue reading this section for the full explanation and source context.

Related topics: Completions API, Client Architecture

Chat Completions

The Chat Completions API provides a unified interface for interacting with large language models on the Together platform through conversational message-based interactions. This feature supports text-only and multimodal inputs, streaming responses, function calling, and various generation parameters to control model behavior.

Overview

The Chat Completions resource is the primary interface for conversational AI interactions in the together-python SDK. It follows the OpenAI-compatible chat completions format, enabling developers to switch between providers with minimal code changes while leveraging Together's distributed inference infrastructure.

graph TD
    A[Client Application] --> B[Together Client]
    B --> C[Chat Completions.create]
    C --> D[API Requestor]
    D --> E[Together API]
    E --> F[Model Inference]
    F --> G[Response]
    G --> D
    D --> B
    B --> H[ChatCompletionResponse]
    
    style A fill:#e1f5fe
    style H fill:#c8e6c9

Key capabilities include:

Text Completions: Standard conversational text generation with system, user, and assistant roles
Multimodal Input: Support for images alongside text in user messages
Streaming: Real-time token-by-token response streaming
Function Calling: Tool-use with structured function definitions and responses
Safety Controls: Built-in moderation model integration
Audio Support: Attach audio URLs to messages for Whisper-transcribed context

Source: src/together/resources/chat/completions.py:1-50

Installation and Setup

Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.

export TOGETHER_API_KEY=your_api_key_here

Client Initialization

from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="your_api_key_here")

# Custom base URL for testing
client = Together(
    api_key="your_api_key_here",
    base_url="https://api.together.xyz"
)

Source: README.md

API Reference

Method Signature

ChatCompletions.create(
    model: str,
    messages: List[ChatCompletionMessageParam],
    frequency_penalty: Optional[float] = None,
    max_tokens: Optional[int] = None,
    n: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    stop: Optional[Union[str, List[str]]] = None,
    stream: Optional[bool] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    top_k: Optional[int] = None,
    min_p: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    logprobs: Optional[int] = None,
    echo: Optional[bool] = None,
    safety_model: Optional[str] = None,
    response_format: Optional[ResponseFormat] = None,
    tools: Optional[List[ChatCompletionToolParam]] = None,
    tool_choice: Optional[Union[ChatCompletionToolChoiceEnum, ChatCompletionNamedToolChoiceParam]] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    max_completion_tokens: Optional[int] = None,
) -> ChatCompletionResponse

Source: src/together/resources/chat/completions.py:1-50

Parameters

Parameter	Type	Required	Default	Description
`model`	`str`	Yes	-	Model identifier (e.g., `meta-llama/Llama-4-Scout-17B-16E-Instruct`)
`messages`	`List[ChatCompletionMessageParam]`	Yes	-	List of conversation messages with roles
`temperature`	`float`	No	`0.7`	Sampling temperature (0.0-2.0)
`top_p`	`float`	No	`1.0`	Nucleus sampling threshold
`top_k`	`int`	No	-	Top-k token selection
`min_p`	`float`	No	-	Minimum probability threshold
`max_tokens`	`int`	No	`256`	Maximum tokens to generate
`max_completion_tokens`	`int`	No	-	Alternative to max_tokens
`stream`	`bool`	No	`False`	Enable streaming response
`stop`	`str` or `List[str]`	No	-	Stop sequences
`n`	`int`	No	`1`	Number of completions to generate
`presence_penalty`	`float`	No	`0.0`	Penalize repeated tokens
`frequency_penalty`	`float`	No	`0.0`	Penalize frequent tokens
`repetition_penalty`	`float`	No	`1.0`	Token repetition penalty
`logprobs`	`int`	No	-	Return log probabilities
`echo`	`bool`	No	`False`	Echo prompt in response
`safety_model`	`str`	No	-	Moderation model identifier
`response_format`	`ResponseFormat`	No	-	Constrain output format (JSON schema)
`tools`	`List[ChatCompletionToolParam]`	No	-	Available function definitions
`tool_choice`	`str` or `dict`	No	`"auto"`	Tool selection strategy
`audio`	`ChatCompletionAudioParam`	No	-	Audio parameters for voice input

Message Format

Message Roles

The chat completions API supports structured conversation turns through a role-based message system:

Role	Description	Content Type
`system`	Instructions and context	Text only
`user`	Human input	Text, images, or mixed
`assistant`	Model responses	Text and tool calls
`tool`	Function execution results	Text (JSON)
`developer`	Developer instructions	Text only

Message Structure

from together import Together
from together.types.chat.chat_completion_message_param import ChatCompletionMessageParam

client = Together()

messages: List[ChatCompletionMessageParam] = [
    {
        "role": "system",
        "content": "You are a helpful coding assistant."
    },
    {
        "role": "user", 
        "content": "Write a Python function to calculate factorial."
    }
]

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages
)

print(response.choices[0].message.content)

Source: src/together/resources/chat/completions.py:1-50

Multimodal Messages

User messages can include both text and images using a content array:

response = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.png"
                }
            }
        ]
    }]
)

Image URL content items must follow specific validation rules. The image_url field must be a dictionary containing a url key with a valid URL string. Images are only permitted in user role messages.

Source: src/together/utils/files.py:1-50

Streaming Responses

The API supports server-sent events (SSE) streaming for real-time token generation:

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming Architecture

sequenceDiagram
    participant Client
    participant APIRequestor
    participant TogetherAPI
    participant Model
    
    Client->>APIRequestor: create(stream=True)
    APIRequestor->>TogetherAPI: POST /chat/completions
    TogetherAPI->>Model: Start inference
    Model-->>TogetherAPI: Token 1
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Model-->>TogetherAPI: Token 2
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Note over Model,Client: Streaming continues...
    Model-->>TogetherAPI: [DONE]
    TogetherAPI-->>APIRequestor: [DONE]
    APIRequestor-->>Client: Iterator ends

When streaming is enabled, the method returns an async generator that yields ChatCompletionChunk objects. Each chunk contains incremental deltas that should be accumulated to reconstruct the complete response.

Source: src/together/resources/chat/completions.py:40-80

Function Calling

Function calling enables models to invoke predefined tools with structured outputs. This follows the OpenAI function calling schema.

Defining Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

Tool Execution Flow

graph TD
    A[User Query] --> B[Create with tools]
    B --> C{Model selects tool?}
    C -->|Yes| D[Return tool_call]
    C -->|No| E[Return text response]
    D --> F[Execute function]
    F --> G[tool role message]
    G --> H[Continue with messages]
    H --> B
    E --> I[Final Response]
    
    style D fill:#fff3e0
    style G fill:#e8f5e9

Multi-turn Conversation

After receiving a function call, append the assistant's tool call message and the tool response:

# Initial request with tools
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

assistant_msg = response.choices[0].message
print(f"Tool called: {assistant_msg.tool_calls[0].function.name}")
print(f"Arguments: {assistant_msg.tool_calls[0].function.arguments}")

# Simulate tool execution
tool_result = {"temperature": 22, "conditions": "Sunny"}

# Continue conversation with tool response
messages = [
    {"role": "user", "content": "What's the weather in Paris?"},
    assistant_msg,
    {
        "role": "tool",
        "tool_call_id": assistant_msg.tool_calls[0].id,
        "content": json.dumps(tool_result)
    }
]

final_response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages,
    tools=tools
)

Note: There is a known issue (#113) where tool/function response messages with role='tool' may encounter validation errors. Ensure the tool_call_id matches exactly and the content is valid JSON.

Source: src/together/resources/chat/completions.py:20-60

CLI Interface

The Together CLI provides command-line access to chat completions:

# Basic chat completion
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, how are you?"

# Streaming response
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Write a story"

# With temperature control
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Explain physics" \
    --temperature 0.8

CLI Options

Option	Type	Description
`--message`	`(str, str)` multiple	Message as role-content tuple
`--model`	`str`	Model identifier (required)
`--max-tokens`	`int`	Maximum tokens to generate
`--temperature`	`float`	Sampling temperature
`--top-p`	`int`	Nucleus sampling
`--top-k`	`float`	Top-k sampling
`--stop`	`str` multiple	Stop sequences
`--repetition-penalty`	`float`	Repetition penalty
`--presence-penalty`	`float`	Presence penalty
`--frequency-penalty`	`float`	Frequency penalty
`--min-p`	`float`	Minimum p sampling
`--no-stream`	`flag`	Disable streaming
`--safety-model`	`str`	Moderation model
`--raw`	`flag`	Return raw JSON

Source: src/together/cli/api/chat.py:1-100

Error Handling

The SDK provides structured exception types for different error conditions:

from together import Together
from together.error import (
    TogetherException,
    RateLimitError,
    APIConnectionError,
    Timeout,
    AuthenticationError
)

client = Together()

try:
    response = client.chat.completions.create(
        model="invalid-model-name",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limited: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except TogetherException as e:
    print(f"API error: {e}")

Exception Hierarchy

classDiagram
    class TogetherException {
        +message
    }
    class RateLimitError {
        +message
    }
    class APIConnectionError {
        +message
    }
    class Timeout {
        +message
    }
    class AuthenticationError {
        +message
    }
    class FileTypeError {
        +message
    }
    
    TogetherException <|-- RateLimitError
    TogetherException <|-- APIConnectionError
    TogetherException <|-- Timeout
    TogetherException <|-- FileTypeError

Common Error Codes

Error Type	Cause	Resolution
`400 Bad Request`	Invalid parameters	Check message format, model name
`401 Unauthorized`	Invalid API key	Verify TOGETHER_API_KEY
`429 Too Many Requests`	Rate limit exceeded	Implement exponential backoff
`500 Internal Error`	Server error	Retry with backoff
`504 Gateway Timeout`	Request timeout	Increase timeout or retry

Source: src/together/error.py:1-80

Response Format

Standard Response

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}]
)

# Access response attributes
print(response.id)           # chatcmpl-xxx
print(response.model)       # meta-llama/Llama-4-Scout-17B-16E-Instruct
print(response.choices[0].message.content)  # Response text
print(response.usage.prompt_tokens)         # Input tokens
print(response.usage.completion_tokens)    # Output tokens
print(response.usage.total_tokens)         # Total tokens

Streaming Chunk

for chunk in stream:
    # ChatCompletionChunk structure
    print(chunk.id)              # Same ID as final response
    print(chunk.choices[0].delta.content)  # Incremental content
    print(chunk.choices[0].finish_reason)  # 'stop' or 'length'

Async Usage

The SDK provides async variants for concurrent operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def multi_chat():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Query {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    
    for response in responses:
        print(response.choices[0].message.content)

asyncio.run(multi_chat())

Retry Logic and Timeouts

The API requestor implements automatic retry with exponential backoff:

from together.constants import (
    MAX_RETRIES,
    INITIAL_RETRY_DELAY,
    MAX_RETRY_DELAY,
    TIMEOUT_SECS
)

# Default configuration
# MAX_RETRIES: 10
# INITIAL_RETRY_DELAY: 0.5 seconds
# MAX_RETRY_DELAY: 120 seconds
# TIMEOUT_SECS: 600 seconds

# Custom configuration
client = Together(
    max_retries=5,
    timeout=300
)

The retry strategy handles:

Connection timeouts
5xx server errors
Rate limit responses (429)

Source: src/together/abstract/api_requestor.py:1-100

Known Limitations

Issue	Description	Workaround
typer version conflict	SDK requires `typer<0.16.0`	Use virtual environments
Model type validation	Some model types not recognized	Use model names directly
Tool response format	`role='tool'` messages may fail validation	Ensure proper `tool_call_id` and JSON content

For the most current issues and workarounds, refer to the GitHub Issues.

Best Practices

Token Management: Always set max_tokens to prevent runaway generation
Error Handling: Wrap API calls in try-except blocks with appropriate exception handling
Streaming: Use streaming for better perceived latency on long responses
Context Management: Keep message lists manageable; trim old messages when对话 exceeds model context
Safety: Enable safety_model for user-facing applications

Completions API

Related topics: Chat Completions, Embeddings and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Environment Configuration

Continue reading this section for the full explanation and source context.

Section Client Initialization

Continue reading this section for the full explanation and source context.

Section Synchronous Completion

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Embeddings and Reranking

Completions API

The Completions API provides access to language model text completion endpoints in the Together AI platform. This API enables developers to generate text completions from various open-source models hosted on Together AI, supporting use cases ranging from code generation to creative writing.

Overview

The Together Python SDK provides two primary APIs for text generation:

Completions API - Designed for legacy text completion models and prompt-based generation
Chat Completions API - Optimized for modern chat-based models with structured message formats

Both APIs support synchronous, asynchronous, and streaming modes of operation.

Source: README.md:1-50

Installation and Setup

Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.

export TOGETHER_API_KEY=xxxxx

Client Initialization

from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="xxxxx")

Source: README.md:10-20

Usage Patterns

Synchronous Completion

The synchronous method blocks until the complete response is received:

from together import Together

client = Together()
response = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    max_tokens=200,
)
print(response.choices[0].text)

Source: README.md:80-90

Streaming Completion

Streaming allows real-time response generation by processing chunks as they arrive:

from together import Together

client = Together()
stream = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Source: README.md:92-103

Asynchronous Completion

The async API enables concurrent requests for improved throughput:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()
prompts = [
    "Write a Next.js component with TailwindCSS for a header component.",
    "Write a python function for the fibonacci sequence",
]

async def async_completion(prompts):
    tasks = [
        async_client.completions.create(
            model="codellama/CodeLlama-34b-Python-hf",
            prompt=prompt,
        )
        for prompt in prompts
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].text)

asyncio.run(async_completion(prompts))

Source: README.md:105-125

API Parameters

Core Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model identifier from the available Together AI models
`prompt`	string	Yes	The input prompt for text generation
`max_tokens`	integer	No	Maximum number of tokens to generate
`temperature`	float	No	Sampling temperature (0.0-2.0, default varies by model)
`top_p`	float	No	Nucleus sampling probability threshold
`top_k`	integer	No	Top-k sampling parameter
`stream`	boolean	No	Enable streaming response (default: false)
`n`	integer	No	Number of completions to generate
`stop`	string/array	No	Stop sequence(s) to end generation
`logprobs`	integer	No	Number of top log probabilities to return
`echo`	boolean	No	Echo the prompt in the response
`repetition_penalty`	float	No	Penalty for token repetition
`presence_penalty`	float	No	Penalize tokens based on presence
`frequency_penalty`	float	No	Penalize tokens based on frequency
`min_p`	float	No	Minimum probability threshold for sampling
`safety_model`	string	No	Moderation model to use

Source: src/together/cli/api/completions.py:1-50

Parameter Details

#### Sampling Parameters

temperature: Controls randomness in generation. Lower values (0.1-0.3) produce more deterministic output, while higher values (0.7-1.0) increase creativity.
top_p: Also known as nucleus sampling, controls the cumulative probability mass to consider.
top_k: Limits token selection to the top k most probable tokens.

#### Repetition Control

repetition_penalty: Values > 1.0 discourage repetition, values < 1.0 encourage it.
presence_penalty: Positive values encourage discussing new topics.
frequency_penalty: Positive values reduce repetition of high-frequency tokens.

CLI Usage

The SDK includes a command-line interface for completions:

together completions "Your prompt here" --model codellama/CodeLlama-34b-Python-hf

CLI Options

Option	Short	Description	Default
`--model`	-m	Model name	Required
`--max-tokens`	-t	Max tokens to generate	None
`--temperature`	-T	Sampling temperature	None
`--top-p`	-p	Top p sampling	None
`--top-k`	-k	Top k sampling	None
`--stop`	-s	Stop sequences (multiple allowed)	None
`--no-stream`	-ns	Disable streaming	False
`--repetition-penalty`	-rp	Repetition penalty	None
`--presence-penalty`	-pp	Presence penalty	None
`--frequency-penalty`	-fp	Frequency penalty	None
`--min-p`	-mp	Minimum p	None
`--logprobs`	-l	Return log probabilities	None
`--echo`	-e	Echo prompt in response	False
`--n`	-n	Number of generations	None
`--safety-model`	-sm	Moderation model	None
`--raw`	-r	Return raw JSON response	False

Source: src/together/cli/api/completions.py:1-75

CLI Streaming Output

When streaming is enabled (default), the CLI processes chunks in real-time:

if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices

        if raw:
            click.echo(f"{json.dumps(chunk.model_dump(exclude_none=True))}")
            continue

        for stream_choice in sorted(chunk.choices, key=lambda c: c.index):
            assert isinstance(stream_choice, CompletionChoicesChunk)
            assert stream_choice.delta
            click.echo(f"{stream_choice.delta.content}", nl=False)

Source: src/together/cli/api/completions.py:45-65

Response Structure

Completion Response

Field	Type	Description
`id`	string	Unique identifier for the completion
`choices`	array	Array of completion choices
`choices[].text`	string	Generated text content
`choices[].index`	integer	Choice index for multiple completions
`choices[].finish_reason`	string	Reason for completion ending
`model`	string	Model used for generation
`usage`	object	Token usage statistics

Streaming Chunk Response

Field	Type	Description
`id`	string	Chunk identifier
`choices`	array	Array of delta choices
`choices[].delta`	object	Incremental text delta
`choices[].delta.content`	string	Delta text content
`choices[].index`	integer	Choice index

Architecture

Request Flow

graph TD
    A[Client.completions.create] --> B[Validate Parameters]
    B --> C[APIRequestor]
    C --> D{HTTP Method}
    D -->|POST| E[Send Request to together.ai]
    D -->|Streaming| F[Return Chunk Iterator]
    E --> G[Parse Response]
    G --> H[Return CompletionResponse]
    F --> I[Stream Chunks]
    I --> J[Yield CompletionChunk]

Response Handling

graph TD
    A[API Response] --> B{Streaming Mode?}
    B -->|Yes| C[Return Async Generator]
    B -->|No| D[Return TogetherResponse]
    C --> E[ChatCompletionChunk]
    D --> F[CompletionResponse]

Error Handling

Exception Types

The SDK defines specific exception types for different error conditions:

Exception	Description
`TogetherException`	Base exception class
`RateLimitError`	API rate limit exceeded
`APIConnectionError`	Network connectivity issues
`Timeout`	Request timeout
`FileTypeError`	Invalid file type
`AttributeError`	Invalid attribute access

Source: src/together/error.py:1-60

Error Response Structure

class TogetherErrorResponse(BaseModel):
    message: str
    type: str
    code: Optional[str] = None
    param: Optional[str] = None

Common Error Scenarios

Rate Limiting: When API rate limits are exceeded, the SDK automatically retries with exponential backoff based on configuration.

Timeout: Configurable timeout with default handling:

# Default timeout is 60 seconds
TIMEOUT_SECS = 60

Source: src/together/abstract/api_requestor.py:20-40

Invalid Model: Returns validation error with available model list

Configuration Options

Client Configuration

Parameter	Type	Default	Description
`api_key`	string	env: TOGETHER_API_KEY	API authentication key
`base_url`	string	api.together.ai	API base URL
`timeout`	integer	60	Request timeout in seconds
`max_retries`	integer	3	Maximum retry attempts

Retry Configuration

MAX_RETRIES = 3
INITIAL_RETRY_DELAY = 0.5  # seconds
MAX_RETRY_DELAY = 2.0  # seconds
MAX_CONNECTION_RETRIES = 2
MAX_SESSION_LIFETIME_SECS = 300

Source: src/together/abstract/api_requestor.py:20-40

Known Limitations and Issues

Dependency Conflicts

The SDK has a dependency on typer<0.16.0, which may cause conflicts with projects requiring newer versions of typer. This is a known issue tracked in #348.

Variable Scope Issue

A known UnboundLocalError issue can occur in certain error scenarios when the result variable is referenced before assignment. This is being tracked in #143.

Best Practices

Efficient Usage

Use Streaming for Long Outputs: When expecting long completions, use streaming to improve perceived latency
Batch Requests with Async: Use AsyncTogether for parallel API calls
Set Appropriate Limits: Configure max_tokens to prevent excessive generation

Production Considerations

Implement Retry Logic: The SDK handles retries, but implement additional logic for critical operations
Monitor Token Usage: Track usage via response usage field
Use Safety Models: Enable moderation for user-facing applications

Embeddings and Reranking

Related topics: Chat Completions, Files API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Use Cases

Continue reading this section for the full explanation and source context.

Section Python Client Usage

Continue reading this section for the full explanation and source context.

Section Embeddings Response Model

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Files API

Embeddings and Reranking

The Together Python SDK provides first-class support for text embeddings and document reranking through dedicated resource classes. These features enable semantic search, document retrieval, and information discovery workflows by converting text into dense vector representations and reordering search results based on relevance.

Overview

Embeddings and reranking are complementary capabilities that power modern retrieval-augmented generation (RAG) and search systems. The SDK exposes these through the embeddings and rerank namespaces on the main Together client, following a consistent pattern with other API resources like chat completions.

graph LR
    A[Text Input] --> B[Embeddings API]
    B --> C[Vector Embeddings]
    C --> D[Reranking API]
    D --> E[Re-ranked Results]
    
    F[Query] --> D
    G[Document Pool] --> D

Key characteristics:

Both endpoints use the same Together client instance
Responses are returned as Pydantic model objects for type safety
Both support synchronous and async patterns via Together and AsyncTogether
Input text requires newline normalization for optimal results

Embeddings

Purpose and Use Cases

The Embeddings API converts text into high-dimensional vector representations that capture semantic meaning. These vectors can be stored in vector databases and used for similarity search, clustering, or as features for downstream ML tasks.

Common use cases include:

Semantic search systems
Document clustering and categorization
Recommendation systems
Duplicate detection
Feature extraction for classification tasks

Python Client Usage

from typing import List
from together import Together

client = Together()

def get_embeddings(texts: List[str], model: str) -> List[List[float]]:
    # Normalize newlines as recommended by the SDK
    texts = [text.replace("\n", " ") for text in texts]
    
    outputs = client.embeddings.create(model=model, input=texts)
    
    # Extract embedding vectors in order
    return [outputs.data[i].embedding for i in range(len(texts))]

# Example usage
input_texts = ["Our solar system orbits the Milky Way galaxy at about 515,000 mph"]
embeddings = get_embeddings(
    input_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval"
)
print(embeddings)

Embeddings Response Model

The EmbeddingsCreateResponse model provides structured access to API responses:

Field	Type	Description
`object`	`str`	Object type, typically `"list"`
`data`	`List[Embedding]`	List of embedding objects
`model`	`str`	Model used for embeddings
`usage`	`EmbeddingUsage`	Token usage statistics

Each Embedding object contains:

Field	Type	Description
`object`	`str`	Object type, typically `"embedding"`
`embedding`	`List[float]`	The embedding vector
`index`	`int`	Position in the input list

The EmbeddingUsage object tracks:

Field	Type	Description
`prompt_tokens`	`int`	Tokens in the input
`total_tokens`	`int`	Total tokens processed

API Parameters

Parameter	Type	Required	Default	Description
`model`	`str`	Yes	-	Embedding model identifier
`input`	`Union[str, List[str]]`	Yes	-	Text(s) to embed

Available Embedding Models

The SDK works with embedding models available on the Together platform. Common models include:

togethercomputer/m2-bert-80M-8k-retrieval - 8K context, 80M parameters
togethercomputer/m2-bert-80M-2k-retrieval - 2K context, 80M parameters

Model availability can be queried using:

models = client.models.list()
# Filter for embedding models

Reranking

Purpose and Use Cases

The Reranking API takes a query and a set of documents, then returns those documents reordered by relevance to the query. This is particularly valuable when combined with embeddings-based retrieval to refine initial search results.

Common use cases include:

Improving search result quality after initial embedding-based retrieval
Multi-stage retrieval pipelines
Reordering candidates from vector similarity search
Question answering systems retrieving relevant context

Python Client Usage

from typing import List
from together import Together

client = Together()

def get_reranked_documents(
    query: str, 
    documents: List[str], 
    model: str, 
    top_n: int = 3
) -> List[str]:
    outputs = client.rerank.create(
        model=model,
        query=query,
        documents=documents,
        top_n=top_n
    )
    
    # Sort by relevance score and return original documents
    return [
        documents[i] 
        for i in sorted(
            [x.index for x in outputs.results], 
            key=lambda x: outputs.results[x].relevance_score, 
            reverse=True
        )
    ]

# Example usage
query = "What is the capital of the United States?"
documents = ["New York", "Washington, D.C.", "Los Angeles"]

reranked = get_reranked_documents(query, documents, top_n=3)
print(reranked)  # ["Washington, D.C.", "New York", "Los Angeles"]

Reranking Response Model

The RerankResponse model provides structured access to reranking results:

Field	Type	Description
`id`	`str`	Request identifier
`results`	`List[Ranking]`	List of ranked documents
`meta`	`RerankMeta`	Metadata including model and usage
`object`	`str`	Object type

Each Ranking object contains:

Field	Type	Description
`index`	`int`	Original document index
`relevance_score`	`float`	Relevance score (higher = more relevant)
`document`	`Document`	The document object with text

The Document object:

Field	Type	Description
`text`	`str`	Document text content

The RerankMeta object:

Field	Type	Description
`model_id`	`str`	Model used for reranking
`usage`	`RerankUsage`	Token usage statistics

API Parameters

Parameter	Type	Required	Default	Description
`model`	`str`	Yes	-	Reranking model identifier
`query`	`str`	Yes	-	The query to rank documents against
`documents`	`List[str]`	Yes	-	Documents to be ranked
`top_n`	`int`	No	`3`	Number of top results to return
`max_chunks_per_doc`	`int`	No	`None`	Max chunks per document (model-dependent)
`return_documents`	`bool`	No	`True`	Whether to include document text in response

Combined Workflow

A typical retrieval pipeline combines embeddings and reranking:

graph TD
    A[User Query] --> B[Embed Query]
    C[Document Corpus] --> D[Embed All Documents]
    B --> E[Vector Similarity Search]
    D --> E
    E --> F[Candidate Documents]
    F --> G[Rerank with Query]
    G --> H[Final Results]
    
    I[Vector Database] <--> D

Complete Example

from typing import List
from together import Together

client = Together()

EMBEDDING_MODEL = "togethercomputer/m2-bert-80M-8k-retrieval"
RERANK_MODEL = "BAAI/bge-reranker"

def semantic_search(
    query: str,
    documents: List[str],
    embedding_model: str = EMBEDDING_MODEL,
    rerank_model: str = RERANK_MODEL,
    top_k: int = 10,
    final_k: int = 3
) -> List[dict]:
    """
    Combined embeddings + reranking search pipeline.
    """
    # Step 1: Embed the query
    query_embedding = client.embeddings.create(
        model=embedding_model,
        input=query.replace("\n", " ")
    ).data[0].embedding
    
    # Step 2: Embed all documents
    doc_embeddings = client.embeddings.create(
        model=embedding_model,
        input=[doc.replace("\n", " ") for doc in documents]
    )
    
    # Step 3: Simple cosine similarity (for demonstration)
    # In production, use a proper vector database
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings.data):
        similarity = sum(q * d for q, d in zip(query_embedding, doc_emb.embedding))
        similarities.append((i, similarity))
    
    # Sort by similarity and take top_k
    similarities.sort(key=lambda x: x[1], reverse=True)
    candidate_indices = [idx for idx, _ in similarities[:top_k]]
    candidate_docs = [documents[i] for i in candidate_indices]
    
    # Step 4: Rerank candidates
    rerank_results = client.rerank.create(
        model=rerank_model,
        query=query,
        documents=candidate_docs,
        top_n=final_k
    )
    
    # Step 5: Extract final results with scores
    results = []
    for result in rerank_results.results:
        results.append({
            "document": result.document.text,
            "relevance_score": result.relevance_score,
            "original_index": result.index
        })
    
    return results

# Usage
query = "machine learning optimization techniques"
corpus = [
    "Gradient descent is a first-order iterative optimization algorithm.",
    "The capital of France is Paris.",
    "Stochastic gradient descent uses random subsets of data.",
    "Climate change affects global weather patterns.",
    "Adam optimizer combines momentum and RMSprop concepts."
]

results = semantic_search(query, corpus)
for r in results:
    print(f"Score: {r['relevance_score']:.4f} - {r['document']}")

Async Usage

Both embeddings and reranking support asynchronous operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def async_embeddings():
    tasks = [
        async_client.embeddings.create(
            model="togethercomputer/m2-bert-80M-8k-retrieval",
            input=texts
        )
        for texts in batched_documents
    ]
    results = await asyncio.gather(*tasks)
    return results

async def async_rerank():
    return await async_client.rerank.create(
        model="BAAI/bge-reranker",
        query="What is deep learning?",
        documents=["Doc 1", "Doc 2", "Doc 3"],
        top_n=3
    )

# Run
asyncio.run(async_embeddings())
asyncio.run(async_rerank())

CLI Support

The CLI provides commands for embeddings and reranking operations:

# Embeddings via CLI (using completions with embeddings model)
together completions \
  "Our solar system orbits the Milky Way galaxy" \
  --model togethercomputer/m2-bert-80M-8k-retrieval

Note: Direct CLI commands for embeddings may require specific model configurations. For full reranking CLI support, use the Python API.

Error Handling

Both resources can raise standard Together exceptions defined in src/together/error.py:

Error Type	Description
`TogetherException`	Base exception class
`RateLimitError`	API rate limit exceeded
`APIConnectionError`	Network connectivity issues

from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.embeddings.create(
        model="togethercomputer/m2-bert-80M-8k-retrieval",
        input="Sample text"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except TogetherException as e:
    print(f"API error: {e}")

Input Text Normalization

The SDK documentation recommends normalizing newline characters in input text:

# Recommended: normalize input text
normalized_texts = [text.replace("\n", " ") for text in texts]

# Create embeddings
response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=normalized_texts
)

This normalization helps ensure consistent embedding quality across varied text inputs.

Known Limitations

Based on community feedback and issue tracking:

Model availability: Embedding and reranking model availability may vary. Always verify model identifiers against the Together model marketplace.

Batch sizes: Large batches of documents may require multiple API calls. Consider batching strategies for large document collections.

Token limits: Both APIs have token limits that may restrict single-request document counts. Monitor usage fields in responses.

Image Generation

The Image Generation module in together-python provides programmatic access to Together AI's image synthesis API, enabling developers to generate images from text prompts using state-of-the-art diffusion models. This module supports both synchronous and asynchronous requests, includes a comprehensive CLI interface, and returns images in multiple formats suitable for various downstream applications.

Overview

The Image Generation feature is part of the Together AI Python SDK that abstracts the complexity of API communication and response parsing. It allows developers to:

Generate images from text prompts using supported diffusion models
Configure generation parameters such as dimensions, steps, and seed
Support negative prompts to guide generation away from unwanted elements
Return images as Base64-encoded data or URLs
Integrate seamlessly with other SDK features like chat completions and embeddings

Image generation is accessed through the client.images namespace in the main Together client, following a consistent pattern used throughout the SDK. Source: src/together/resources/images.py:1-50

Architecture

Component Overview

The image generation system consists of several interconnected components that work together to provide a unified interface:

graph TD
    A[User Code] --> B[Together Client]
    B --> C[Images Resource]
    C --> D[APIRequestor]
    D --> E[Together API]
    E --> F[ImageResponse]
    F --> G[User Code]
    
    H[CLI Command] --> C
    I[ImageCLI] --> B
    
    J[ImageRequest Type] --> C
    K[ImageResponse Type] --> F

Module Structure

Component	File Path	Purpose
Images Resource	`src/together/resources/images.py`	Main API client for image generation
Image Types	`src/together/types/images.py`	Pydantic models for request/response validation
CLI Module	`src/together/cli/api/images.py`	Command-line interface for image generation
File Utils	`src/together/utils/files.py`	Helper utilities for file operations

Source: src/together/resources/images.py

API Reference

Client Method: `client.images.generate()`

The primary method for generating images. Supports both synchronous and asynchronous operation modes.

Signature:

async def generate(
    self,
    prompt: str,
    model: str,
    *,
    seed: Optional[int] = None,
    n: int = 1,
    height: int = 1024,
    width: int = 1024,
    negative_prompt: Optional[str] = None,
    **kwargs,
) -> ImageResponse

#### Parameters

Parameter	Type	Default	Description
`prompt`	`str`	Required	Text description of the desired image
`model`	`str`	Required	Model identifier (e.g., `stabilityai/stable-diffusion-xl-base-1.0`)
`seed`	`int`	`None`	Random seed for reproducible generation
`n`	`int`	`1`	Number of images to generate
`height`	`int`	`1024`	Output image height in pixels
`width`	`int`	`1024`	Output image width in pixels
`negative_prompt`	`str`	`None`	Prompt describing elements to avoid
`**kwargs`	`Any`	N/A	Additional model-specific parameters

Source: src/together/resources/images.py:36-60

#### Returns

Field	Type	Description
`data`	`List[ImageChoicesData]`	List of generated image objects
`data[0].b64_json`	`str`	Base64-encoded PNG image data
`data[0].url`	`str`	Remote URL to the generated image (if available)
`data[0].revised_prompt`	`str`	Prompt revised by the model's safety filter

Source: src/together/types/images.py

Response Type: `ImageResponse`

The ImageResponse object wraps the API response with additional metadata:

class ImageResponse(TogetherBaseResponse):
    data: List[ImageChoicesData]

Source: src/together/types/images.py

Usage Patterns

Basic Synchronous Usage

from together import Together

client = Together()

response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=4,
)

# Access base64-encoded images
for image_data in response.data:
    print(image_data.b64_json)

# Access revised prompt (if modified by safety filter)
for image_data in response.data:
    if image_data.revised_prompt:
        print(f"Revised prompt: {image_data.revised_prompt}")

Source: README.md

Using Seed for Reproducibility

from together import Together

client = Together()

# Generate with a fixed seed for reproducible results
response = client.images.generate(
    prompt="a serene mountain landscape at sunset",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    seed=42,
    n=1,
    height=768,
    width=768,
)

Multiple Images in Single Request

from together import Together

client = Together()

response = client.images.generate(
    prompt="a bowl of fresh fruit",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Generate 4 variations
    width=512,
    height=512,
)

# Process each generated image
for idx, image_data in enumerate(response.data):
    # Save each image to disk
    import base64
    image_bytes = base64.b64decode(image_data.b64_json)
    with open(f"generated_image_{idx}.png", "wb") as f:
        f.write(image_bytes)

CLI Interface

The Together CLI provides a convenient interface for image generation without writing Python code.

Command Structure

together images generate "prompt text" --model <MODEL_NAME> [OPTIONS]

Source: src/together/cli/api/images.py:1-30

CLI Options

Option	Type	Default	Description
`--model`	`str`	Required	Model name to use for generation
`--steps`	`int`	`20`	Number of diffusion steps
`--seed`	`int`	`None`	Random seed for reproducibility
`--n`	`int`	`1`	Number of images to generate
`--height`	`int`	`1024`	Image height in pixels
`--width`	`int`	`1024`	Image width in pixels
`--negative-prompt`	`str`	`None`	Elements to avoid in generation
`--output`	`path`	`.`	Output directory for generated images
`--prefix`	`str`	`image-`	Filename prefix for saved images
`--no-show`	`flag`	`False`	Do not open images in viewer

Source: src/together/cli/api/images.py:31-70

CLI Usage Examples

Basic image generation:

together images generate "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --n 4

Custom dimensions with reproducible seed:

together images generate "mountain landscape" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --seed 12345 \
  --width 512 \
  --height 768 \
  --steps 30

Save to specific directory without viewing:

together images generate "abstract art" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output ./generated_images \
  --prefix "artwork-" \
  --no-show

Image Display Behavior

By default, the CLI automatically opens generated images in the system's default image viewer using the PIL.Image library. This behavior can be disabled with the --no-show flag. Source: src/together/cli/api/images.py:70-90

Supported Models

The Together AI platform supports various image generation models. The SDK allows any compatible model identifier to be passed directly:

Model Family	Example Model Identifier	Typical Use
Stable Diffusion XL	`stabilityai/stable-diffusion-xl-base-1.0`	General purpose generation
Flux	`black-forest-labs/FLUX.1-dev`	High-quality artistic generation
Playground	`playgroundai/playground-v2.5`	Versatile creative work

Source: README.md

To list all available image generation models programmatically:

from together import Together

client = Together()
models = client.models.list()

# Filter for image models
image_models = [m for m in models.data if m.type == "image"]
for model in image_models:
    print(f"{model.display_name}: {model.name}")

Request/Response Flow

sequenceDiagram
    participant User
    participant Client
    participant ImagesResource
    participant APIRequestor
    participant TogetherAPI
    participant ImageResponse
    
    User->>Client: client.images.generate(...)
    Client->>ImagesResource: generate(prompt, model, ...)
    ImagesResource->>ImageRequest: Create ImageRequest
    ImagesResource->>APIRequestor: arequest(POST /images/generations)
    APIRequestor->>TogetherAPI: HTTP POST Request
    TogetherAPI-->>APIRequestor: JSON Response
    APIRequestor-->>ImagesResource: TogetherResponse
    ImagesResource->>ImageResponse: Parse response data
    ImageResponse-->>User: ImageResponse with image data
    
    Note over User,TogetherAPI: Base64 images available in response.data[].b64_json

Source: src/together/resources/images.py:40-70

Common Issues and Troubleshooting

Pillow Version Compatibility

Some users have reported transitive dependency conflicts with the pillow library. The SDK depends on specific pillow versions for image handling and display features in the CLI. If you encounter conflicts with other packages requiring newer pillow versions, consider using separate virtual environments. Source: GitHub Issue #237

Large Image Base64 Handling

Generated images are returned as Base64-encoded strings in b64_json field. When processing large images or multiple images, ensure your application has sufficient memory available. The SDK does not impose a maximum size limit, but the Together API limits images to approximately 10MB when using base64-encoded format. Source: src/together/utils/files.py

API Key Configuration

Image generation requires a valid Together API key. Ensure the TOGETHER_API_KEY environment variable is set or passed directly to the client:

# Via environment variable
# export TOGETHER_API_KEY=your_api_key

client = Together()  # Reads from environment

# Or explicitly
client = Together(api_key="your_api_key")

Rate Limiting

Like other API endpoints, image generation is subject to rate limits. If you encounter RateLimitError, implement exponential backoff in your application:

import time
from together import Together
from together.error import RateLimitError

client = Together()
max_retries = 3

for attempt in range(max_retries):
    try:
        response = client.images.generate(
            prompt="your prompt",
            model="stabilityai/stable-diffusion-xl-base-1.0"
        )
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        else:
            raise

Source: src/together/error.py:40-55

Error Handling

The SDK provides specific exception types for various error conditions:

Exception Type	Description
`TogetherException`	Base exception for all SDK errors
`RateLimitError`	API rate limit exceeded
`APIConnectionError`	Network connectivity issues
`Timeout`	Request timeout

Source: src/together/error.py

Example error handling:

from together import Together
from together.error import TogetherException, RateLimitError, Timeout

client = Together()

try:
    response = client.images.generate(
        prompt="test image",
        model="stabilityai/stable-diffusion-xl-base-1.0"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except Timeout:
    print("Request timed out. The image may be complex - try with fewer steps.")
except TogetherException as e:
    print(f"API error: {e}")

Best Practices

1. Optimize Image Dimensions

For faster generation, use smaller dimensions initially and upscale if needed:

# Faster initial generation
response = client.images.generate(
    prompt="landscape",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    height=512,
    width=512,
    steps=20,  # Fewer steps for draft
)

2. Use Seeds for Iteration

When refining a concept, use a fixed seed to maintain consistency:

base_seed = 42

# Generate variations while maintaining composition
for i in range(4):
    response = client.images.generate(
        prompt=f"landscape with {'spring' if i % 2 == 0 else 'autumn'} colors",
        model="stabilityai/stable-diffusion-xl-base-1.0",
        seed=base_seed,
    )

3. Batch Generation

Generate multiple images in a single request when possible for efficiency:

response = client.images.generate(
    prompt="concept variations",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Single API call for 4 images
)

4. Handle Revised Prompts

The API may modify prompts for safety reasons. Always check for revised prompts:

response = client.images.generate(
    prompt="your prompt here",
    model="stabilityai/stable-diffusion-xl-base-1.0",
)

for image_data in response.data:
    if image_data.revised_prompt and image_data.revised_prompt != prompt:
        print(f"Prompt was revised to: {image_data.revised_prompt}")

Files API

The Files API provides capabilities for uploading, managing, and validating training datasets for use with Together AI's fine-tuning services. It serves as the foundation for preparing training data that powers model customization workflows.

Overview

The Files API enables developers to:

Upload training datasets in JSONL format for fine-tuning jobs
Validate file content locally before uploading to catch formatting errors early
Manage remote files (list, retrieve, delete) on Together's infrastructure
Support multimodal content including text and image data for vision model training

Source: README.md

Architecture

graph TD
    A[User Code / CLI] --> B[Files API Client]
    B --> C[FileManager]
    C --> D[Together API]
    
    E[Local Validation] --> B
    E --> F[files.py utils]
    F --> G[JSONL Parser]
    G --> H[Content Validators]
    
    I[Fine-tuning] --> D
    I --> C
    
    style D fill:#e1f5fe
    style C fill:#fff3e0
    style F fill:#f3e5f5

Component Overview

Component	File	Responsibility
`Together` client	`filemanager.py`	Main API entry point
`FileManager`	`filemanager.py`	Handles file operations
`files.py` utils	`utils/files.py`	Local validation and parsing
CLI commands	`cli/api/files.py`	Command-line interface

Source: src/together/filemanager.py

File Validation

The SDK provides robust local validation capabilities through the files.py utility module. This validation runs before uploads to catch formatting errors early, preventing failed fine-tuning jobs due to malformed data.

Validation Rules

The validator checks multiple aspects of your JSONL files:

Validation Rule	Description	Error Type
`content` field type	Must be a list of dicts	`InvalidFileFormatError`
`type` field presence	Each item must have a `type` field	`InvalidFileFormatError`
Text content	For `type: "text"`, must have valid `text` string	`InvalidFileFormatError`
Image content	For `type: "image_url"`, must have valid `image_url` dict	`InvalidFileFormatError`
Image size	Base64 images must be under 10MB	`InvalidFileFormatError`
Image limit	Maximum 10 images per example	`InvalidFileFormatError`
Image role	Images only allowed in user messages	`InvalidFileFormatError`

Source: src/together/utils/files.py

Supported Content Types

# Text content
{"type": "text", "text": "The training prompt here"}

# Image URL content
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}

# Base64 image content
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}

Multimodal Dataset Structure

The validator supports multimodal datasets for vision model fine-tuning:

graph LR
    A[JSONL Line] --> B{Parse content}
    B -->|List| C[Validate each item]
    B -->|String| D[Plain text]
    
    C --> E{type == "text"?}
    C --> F{type == "image_url"?}
    
    E -->|Yes| G[Validate text field]
    F -->|Yes| H[Validate image_url dict]
    F -->|No| I[Error: Unknown type]
    
    H --> J{URL or Base64?}
    J -->|Base64| K[Check size < 10MB]
    K --> L[Count images]
    J -->|URL| L

Source: src/together/utils/files.py

Python Client Usage

Initialization

from together import Together

client = Together()

The client automatically reads the TOGETHER_API_KEY environment variable. You can also pass the key explicitly:

client = Together(api_key="your-api-key-here")

File Operations

#### Upload a File

response = client.files.upload(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)
print(response.id)

#### List Files

files = client.files.list()

for file in files.data:
    print(f"ID: {file.id}, Filename: {file.filename}, Size: {file.bytes}")

#### Retrieve File Metadata

file_info = client.files.retrieve(file_id="file-xxxxx")
print(f"Created: {file_info.created_at}")
print(f"Filename: {file_info.filename}")

#### Retrieve File Content

content = client.files.retrieve_content(file_id="file-xxxxx")
print(content)

#### Delete a File

result = client.files.delete(file_id="file-xxxxx")
print(result.deleted)

Source: src/together/filemanager.py

CLI Usage

The together files command provides a command-line interface for file operations.

Command Overview

together files --help

Command	Description
`together files check`	Validate a local file before uploading
`together files upload`	Upload a file to Together AI
`together files list`	List all uploaded files
`together files retrieve`	Get file metadata
`together files retrieve-content`	Download file content
`together files delete`	Delete a remote file

Source: README.md

Check File (Local Validation)

Validate your JSONL file locally before uploading:

together files check example.jsonl

This runs the same validation logic that the SDK uses, checking:

JSONL format validity
Content structure
Multimodal content rules
Image size limits

Upload a File

together files upload example.jsonl

List Files

together files list

Retrieve File Metadata

together files retrieve file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Retrieve File Content

together files retrieve-content file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Delete a Remote File

together files delete file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Data Flow for Fine-tuning

The Files API integrates directly with the Fine-tuning API. Here's how files flow through the system:

sequenceDiagram
    participant User
    participant CLI as Files CLI
    participant SDK as Python SDK
    participant API as Together API
    participant FT as Fine-tuning |
    
    User->>CLI: together files upload data.jsonl
    CLI->>SDK: client.files.upload()
    SDK->>SDK: Validate locally
    SDK->>API: POST /v1/files
    API-->>SDK: {id: "file-xxxxx"}
    SDK-->>CLI: File upload response
    
    User->>CLI: together fine-tuning create
    CLI->>SDK: client.fine_tuning.create(training_file="file-xxxxx")
    SDK->>API: POST /v1/fine_tuning/jobs
    API-->>SDK: {id: "ft-xxxxx"}
    SDK-->>CLI: Fine-tuning job response

Error Handling

The SDK defines several exception types for file-related errors:

Exception	Use Case
`TogetherException`	Base exception class
`FileTypeError`	Invalid file type or format
`APIConnectionError`	Network connectivity issues
`Timeout`	Request timeout

Source: src/together/error.py

Handling Upload Errors

from together import Together
from together.error import FileTypeError, APIConnectionError

client = Together()

try:
    response = client.files.upload(
        file=open("data.jsonl", "rb"),
        purpose="fine-tune"
    )
except FileTypeError as e:
    print(f"Invalid file format: {e}")
except APIConnectionError as e:
    print(f"Connection error: {e}")

Common Issues

File Format Validation Failures

The local validation (together files check) should be run before uploading. This catches the most common issues:

Missing type field: Every content item must have a type field
Invalid type value: Must be either "text" or "image_url"
Missing text field: Text items must have a text string field
Image in non-user message: Images are only allowed in user roles
Base64 size exceeded: Images must be under 10MB when base64-encoded

Fine-tuning Integration

Files uploaded via the Files API can be used in fine-tuning jobs:

from together import Together

client = Together()

# Upload training file
training_file = client.files.upload(
    file=open("train.jsonl", "rb"),
    purpose="fine-tune"
)

# Create fine-tuning job with uploaded file
job = client.fine_tuning.create(
    training_file=training_file.id,
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct"
)

Source: src/together/resources/finetune.py

Configuration Options

File Upload Parameters

Parameter	Type	Required	Description
`file`	file-like object	Yes	File to upload
`purpose`	string	Yes	Intended use (e.g., `"fine-tune"`)

File Check Parameters

Parameter	Type	Required	Description
`file_path`	string	Yes	Path to local file

Best Practices

Always validate locally first: Run together files check before uploading to catch format errors early
Use descriptive filenames: Makes files easier to identify in the file list
Check file size: Large files may take longer to upload and process
Verify JSONL format: Ensure each line is valid JSON
Test with small dataset first: Validate your pipeline with a subset before full upload

Fine-Tuning

The Fine-Tuning module in the Together Python SDK provides a comprehensive interface for customizing foundation models on the Together Inference API. This module enables developers to adapt pre-trained models to their specific use cases through supervised fine-tuning, LoRA (Low-Rank Adaptation), and advanced alignment methods like DPO (Direct Preference Optimization).

Overview

Fine-tuning transforms a pre-trained model into a specialized tool tailored for specific tasks, domains, or behaviors. The Together platform supports multiple fine-tuning methodologies:

Training Method	Description	Use Case
Full Training	Updates all model weights	Maximum customization, larger datasets
LoRA	Low-Rank Adaptation with adapter weights	Efficient fine-tuning, lower compute costs
DPO	Direct Preference Optimization	Alignment and preference learning
RPO	Relative Preference Optimization	Alternative alignment approach
SimPO	Simple Preference Optimization	Simplified alignment without reference model

Source: src/together/resources/finetune.py

Architecture

The fine-tuning system follows a layered architecture with the FineTuning class serving as the primary interface:

graph TD
    A[User Application] --> B[Together Client]
    B --> C[FineTuning Class]
    C --> D[APIRequestor]
    D --> E[Together Inference API]
    
    F[CLI Commands] --> C
    G[Legacy API] --> C
    
    H[File Validation] --> C
    I[Checkpoint Management] --> C
    J[Price Estimation] --> C

Core Components

Component	Location	Purpose
`FineTuning`	`resources/finetune.py`	Main API interface for fine-tuning operations
`FineTuneCreateRequest`	`types/finetune.py`	Request payload model for job creation
CLI Commands	`cli/api/finetune.py`	Command-line interface for fine-tuning
Legacy API	`legacy/finetune.py`	Backward-compatible wrapper functions
File Validation	`utils/files.py`	Dataset file format validation

Creating Fine-Tuning Jobs

Python Client

The FineTuning.create() method initiates a new fine-tuning job. The method accepts numerous parameters to customize the training process:

from together import Together

client = Together()

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    validation_file="file-def456",
    n_epochs=3,
    batch_size=4,
    learning_rate=1e-5,
    suffix="my-custom-model",
    wandb_api_key="your-wandb-key",
    wandb_project_name="my-project",
)
print(response)

Source: src/together/resources/finetune.py

Supported Parameters

Parameter	Type	Default	Description
`model`	`str`	Required	Base model identifier (e.g., `meta-llama/Llama-3-8b-hf`)
`training_file`	`str`	Required	Uploaded training file ID
`validation_file`	`str`	Optional	Uploaded validation file ID
`n_epochs`	`int`	`3`	Number of training epochs
`n_checkpoints`	`int`	`1`	Number of checkpoints to save
`batch_size`	`int`	Auto	Training batch size
`learning_rate`	`float`	`1e-5`	Initial learning rate
`lr_scheduler_type`	`str`	`cosine`	Learning rate scheduler
`warmup_ratio`	`float`	`0.1`	Warmup ratio for learning rate
`weight_decay`	`float`	`0.01`	Weight decay coefficient
`max_grad_norm`	`float`	`1.0`	Maximum gradient norm
`suffix`	`str`	`None`	Custom suffix for output model name
`lora`	`bool`	`False`	Enable LoRA fine-tuning
`lora_r`	`int`	`8`	LoRA attention dimension
`lora_dropout`	`float`	`0.05`	LoRA dropout probability
`lora_alpha`	`int`	`16`	LoRA alpha parameter
`train_on_inputs`	`bool`	`None`	Mask user messages in training
`train_vision`	`bool`	`False`	Train vision encoder (multimodal models)
`training_method`	`str`	`sft`	Training method (dpo, rpo, simpo)
`from_checkpoint`	`str`	`None`	Resume from previous job checkpoint
`from_hf_model`	`str`	`None`	HuggingFace model to continue training from

Source: src/together/resources/finetune.py

Async Support

For asynchronous applications, use AsyncTogether with the async FineTuning methods:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def create_ft_job():
    response = await async_client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
        n_epochs=3,
    )
    return response

result = asyncio.run(create_ft_job())

Source: src/together/resources/finetune.py

Managing Fine-Tuning Jobs

Job Lifecycle

stateDiagram-v2
    [*] --> Created: create()
    Created --> Queued: Submitted
    Queued --> Running: Started
    Running --> Completed: Success
    Running --> Failed: Error
    Completed --> Cancelled: cancel()
    Queued --> Cancelled: cancel()

Listing Jobs

Retrieve all fine-tuning jobs associated with your account:

response = client.fine_tuning.list()
for job in response.data:
    print(f"ID: {job.id}, Model: {job.model}, Status: {job.status}")

Retrieving Job Details

Get detailed information about a specific fine-tuning job:

job = client.fine_tuning.retrieve(id="ft-job-abc123")
print(f"Status: {job.status}")
print(f"Training steps: {job.training_steps}")
print(f"Output model: {job.output_name}")

Cancelling Jobs

Abort a running or queued fine-tuning job:

result = client.fine_tuning.cancel(id="ft-job-abc123")

Source: src/together/resources/finetune.py

Checkpoint Management

Checkpoints enable resuming training from intermediate states and retrieving model weights for deployment.

Retrieving Checkpoints

checkpoints = client.fine_tuning.checkpoints(id="ft-job-abc123")
for checkpoint in checkpoints.data:
    print(f"Step: {checkpoint.step}, Type: {checkpoint.checkpoint_type}")

The _parse_raw_checkpoints() helper processes raw checkpoint metadata:

parsed_checkpoints = []
for checkpoint in checkpoints:
    step = checkpoint["step"]
    checkpoint_type = checkpoint["checkpoint_type"]
    checkpoint_name = (
        f"{id}:{step}" if "intermediate" in checkpoint_type.lower() else id
    )
    parsed_checkpoints.append(
        FinetuneCheckpoint(
            type=checkpoint_type,
            timestamp=checkpoint["created_at"],
            name=checkpoint_name,
        )
    )

Source: src/together/resources/finetune.py

Download Checkpoints

Download fine-tuned model weights using the CLI:

# Download latest checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Download specific checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-step 1000

# Download with specific checkpoint type
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-type merged

#### Checkpoint Types

Type	Description	Applicable Training
`default`	Default output format	All
`merged`	Merged with base model (LoRA only)	LoRA
`adapter`	Adapter weights only (LoRA only)	LoRA
`model_output_path`	Full model output (Full only)	Full

Source: src/together/cli/api/finetune.py

Download Options

CLI Option	Description
`--output_dir`, `-o`	Output directory for downloaded files
`--checkpoint-step`, `-s`	Specific checkpoint step to download
`--checkpoint-type`	Checkpoint type (default, merged, adapter)

result = client.fine_tuning.download(
    fine_tune_id="ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b",
    output="./model-output",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED,
)
print(f"Downloaded to: {result.filename}")

CLI Commands

The Together CLI provides a comprehensive set of commands for fine-tuning operations:

Create a Fine-Tuning Job

together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --n-epochs 3 \
    --suffix my-custom-model

List Fine-Tuning Jobs

together fine-tuning list

Retrieve Job Details

together fine-tuning retrieve ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

Cancel a Job

# With confirmation prompt
together fine-tuning cancel ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Force deletion without confirmation
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --force

Delete a Job

together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

Source: src/together/cli/api/finetune.py

Weights & Biases Integration

The SDK supports automatic logging to Weights & Biases for experiment tracking:

together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --wandb-api-key your-api-key \
    --wandb-project-name my-project \
    --wandb-name my-experiment-run

Parameter	Description
`--wandb-api-key`	Weights & Biases API key
`--wandb-project-name`	W&B project name
`--wandb-name`	W&B run name
`--wandb-base-url`	W&B base URL (for enterprise deployments)

Source: src/together/cli/api/finetune.py

File Format Requirements

Training and validation files must follow specific JSONL (JSON Lines) format requirements:

Instruction Tuning Format

{"text": "What is the capital of France?\nAnswer: Paris"}

Chat/Conversation Format

{"content": [{"type": "text", "text": "What is the capital of France?"}], "role": "user"}
{"content": [{"type": "text", "text": "Paris"}], "role": "assistant"}

Multimodal Format (with Images)

{"content": [{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}], "role": "user"}

Validation Rules

The file validation system enforces the following rules:

Rule	Error	Source
File must be valid JSONL	`InvalidFileFormatError`	`utils/files.py`
Content must be a list of dicts	`InvalidFileFormatError`	`utils/files.py`
Each item must have `type` field	`InvalidFileFormatError`	`utils/files.py`
Text items must have `text` field (string)	`InvalidFileFormatError`	`utils/files.py`
Image items must be in user messages only	`InvalidFileFormatError`	`utils/files.py`
Image items must have `image_url` dict	`InvalidFileFormatError`	`utils/files.py`

Source: src/together/utils/files.py

Error Handling

The fine-tuning module defines specific exception types for different failure scenarios:

Exception Types

Exception	Use Case
`TogetherException`	Base exception class
`RateLimitError`	API rate limit exceeded
`FileTypeError`	Invalid file format
`APIConnectionError`	Network connectivity issues
`Timeout`	Request timeout

Source: src/together/error.py

Error Response Model

from together.types.error import TogetherErrorResponse

error_response = TogetherErrorResponse(
    message="Invalid training file format",
    type="validation_error",
    param="training_file",
    code="INVALID_FORMAT"
)

Handling Errors

from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"Fine-tuning error: {e}")

Legacy API

The SDK provides backward-compatible wrapper functions in the legacy module:

from together.legacy import finetune

# These functions are deprecated but still functional
response = finetune.create(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
)

⚠️ Warning: The legacy functions emit deprecation warnings. Migrate to the new client.fine_tuning interface for new projects.

Source: src/together/legacy/finetune.py

Common Patterns

Resuming from Checkpoint

Continue training from a previous fine-tuning job:

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_checkpoint="ft-previous-job:1000",  # Resume from step 1000
)

Fine-tuning from HuggingFace Model

Start training from a HuggingFace Hub model:

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_hf_model="username/my-finetuned-model",
    hf_model_revision="v1.0",
)

Training with Price Limits

The SDK includes price estimation to prevent unexpected costs:

price_estimation = client.fine_tuning.estimate_price(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
    training_type="lora",
)

if price_estimation.allowed_to_proceed:
    response = client.fine_tuning.create(...)
else:
    print(f"Estimated cost ${price_estimation.estimated_cost} exceeds limit")

Source: src/together/resources/finetune.py

Price Estimation

The price estimation feature helps users understand the expected cost before starting a fine-tuning job:

graph LR
    A[User Creates Job] --> B{from_checkpoint or from_hf_model?}
    B -->|No| C[Estimate Price]
    B -->|Yes| D[Skip Estimation]
    C --> E{Cost within limits?}
    E -->|Yes| F[Submit Job]
    E -->|No| G[Show Warning]
    D --> F

Price estimation is automatically performed when creating jobs without a checkpoint or HuggingFace model source, unless explicitly disabled.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium v.1.5.31

The project should not be treated as fully validated until this signal is reviewed.

medium v.1.5.33

The project should not be treated as fully validated until this signal is reviewed.

medium v1.5.28

The project should not be treated as fully validated until this signal is reviewed.

Doramagic Pitfall Log

Doramagic extracted 12 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:624113979 | https://github.com/togethercomputer/together-python | README/documentation is current enough for a first validation pass.

2. Project risk: v.1.5.31

Severity: medium
Finding: Project risk is backed by a source signal: v.1.5.31. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.31

3. Project risk: v.1.5.33

Severity: medium
Finding: Project risk is backed by a source signal: v.1.5.33. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.33

4. Project risk: v1.5.28

Severity: medium
Finding: Project risk is backed by a source signal: v1.5.28. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.28

5. Maintenance risk: v.1.5.29

Severity: medium
Finding: Maintenance risk is backed by a source signal: v.1.5.29. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.29

6. Maintenance risk: v1.5.27

Severity: medium
Finding: Maintenance risk is backed by a source signal: v1.5.27. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.27

7. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | last_activity_observed missing

8. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

9. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

10. Security or permission risk: `LogProbs.top_logprobs` typed as `Dict` but API returns `List[Dict]`

Severity: medium
Finding: Security or permission risk is backed by a source signal: LogProbs.top_logprobs typed as Dict but API returns List[Dict]. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/issues/443

11. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | issue_or_pr_quality=unknown

12. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 8

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using together-python with real data or production workflows.

[LogProbs.top_logprobs typed as Dict but API returns List[Dict]](https://github.com/togethercomputer/together-python/issues/443) - github / github_issue
v1.5.35 - github / github_release
v.1.5.33 - github / github_release
v.1.5.31 - github / github_release
v.1.5.29 - github / github_release
v1.5.28 - github / github_release
v1.5.27 - github / github_release
README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

together-python

Overview

Related Pages

Overview

Installation and Setup

Related Pages

Installation and Setup

Overview

Prerequisites

Python Version Requirements

API Key

Installation Methods

Standard Installation (pip)

Poetry Installation

Development Installation

Configuration

Environment Variables

Client Configuration

Optional Dependencies

CLI Setup

Installation

Verification

Common CLI Commands

Client Initialization Patterns

Synchronous Client

Asynchronous Client

Basic Usage Flow

SDK Constants

Error Handling Setup

Exception Types

Error Response Format

Error Handling Example

Known Compatibility Issues

Typer Version Conflict

Pillow Version

Development Environment Setup

1. Install Poetry

2. Configure Poetry

3. Install Dependencies

4. Set Up Pre-commit Hooks

Running Tests

Formatting and Linting

Quick Start Checklist

See Also

Client Architecture

Related Pages

Client Architecture

Overview

Core Components

Together Client Class

API Requestor

Resource Modules

Request/Response Flow

Synchronous Request Flow

Streaming Response Handling

Asynchronous Support

Error Handling

Exception Hierarchy

Error Response Model

Error Handling Example

File Validation Architecture

Validation Rules

Supported Content Types

Fine-tuning Architecture

Training Methods

Checkpoint Management

CLI Architecture

CLI Command Structure

CLI Configuration

Known Issues and Limitations

Dependency Compatibility

Model Type Validation

Tool Response Handling

Best Practices

Connection Management

Error Recovery

Streaming Performance

See Also

Type System

Related Pages