Doramagic Project Pack · Human Manual

together-python

Related topics: Installation and Setup, Client Architecture

Overview

Related topics: Installation and Setup, Client Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, Client Architecture

Overview

The together-python repository is an official Python SDK and Command Line Interface (CLI) for interacting with the Together AI API. It provides developers with programmatic access to a wide range of large language models (LLMs), image generation models, embedding services, and fine-tuning capabilities hosted on the Together platform.

Source: README.md

Source: https://github.com/togethercomputer/together-python / Human Manual

Installation and Setup

Related topics: Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Version Requirements

Continue reading this section for the full explanation and source context.

Section API Key

Continue reading this section for the full explanation and source context.

Section Standard Installation (pip)

Continue reading this section for the full explanation and source context.

Related topics: Overview

Installation and Setup

This page covers the complete installation and setup process for the Together Python SDK (together-python), including prerequisites, configuration options, CLI setup, and development environment configuration.

Overview

The together-python SDK provides a Python interface and command-line tool for interacting with the Together AI API. It enables developers to:

  • Access chat completions with support for multimodal inputs (text and images)
  • Generate text completions
  • Create and manage fine-tuning jobs
  • Generate images
  • Compute embeddings and reranking
  • Manage files and model resources

Source: README.md

Prerequisites

Python Version Requirements

The SDK requires Python 3.10 or higher. The project uses modern Python features including type hints and async/await patterns.

API Key

A valid Together AI API key is required for all API operations. You can obtain an API key by:

  1. Creating an account at api.together.ai
  2. Navigating to the API keys settings page

Source: README.md

Installation Methods

Standard Installation (pip)

Install the latest stable release from PyPI:

pip install together

Poetry Installation

For projects using Poetry as the dependency manager:

poetry add together

Source: CONTRIBUTING.md

Development Installation

For contributors who want to modify the source code or run tests locally:

# Clone the repository
git clone https://github.com/togethercomputer/together-python.git
cd together-python

# Install with development dependencies
poetry install --with quality,tests

Source: CONTRIBUTING.md

Configuration

Environment Variables

The SDK supports configuration through environment variables. The primary variable required is:

Environment VariableDescriptionRequired
TOGETHER_API_KEYYour Together AI API keyYes

#### Setting the API Key

Unix/Linux/macOS:

export TOGETHER_API_KEY=xxxxx

Windows (Command Prompt):

set TOGETHER_API_KEY=xxxxx

Windows (PowerShell):

$env:TOGETHER_API_KEY="xxxxx"

Source: README.md

Client Configuration

The Python client can be initialized with or without an explicit API key:

Using environment variable (recommended):

from together import Together

client = Together()  # Automatically reads TOGETHER_API_KEY

Explicit API key:

from together import Together

client = Together(api_key="your-api-key-here")

Source: README.md

Optional Dependencies

The SDK uses Poetry for dependency management. Some features require optional dependencies:

ExtraPurposeInstall Command
extended_testingAdditional testing dependenciespoetry install --with extended_testing

When adding new dependencies, maintainers follow a strict policy: dependencies should be optional and users who don't have them installed should be able to import the SDK without warnings or errors.

Source: CONTRIBUTING.md

CLI Setup

Installation

The CLI is included with the main package installation. After installing together, the together command becomes available.

Verification

Verify the CLI installation:

together --help

Common CLI Commands

CommandDescription
together chatChat completions
together completionsText completions
together imagesImage generation
together filesFile management
together fine-tuningFine-tuning operations
together modelsList and manage models

Source: README.md

Client Initialization Patterns

Synchronous Client

from together import Together

client = Together()

Asynchronous Client

from together import AsyncTogether

async_client = AsyncTogether()

Basic Usage Flow

graph TD
    A[Install together package] --> B[Set TOGETHER_API_KEY]
    B --> C[Import Together or AsyncTogether]
    C --> D[Initialize client]
    D --> E[Call API methods]
    E --> F[Process response]

SDK Constants

The SDK defines several constants in src/together/constants.py:

ConstantPurpose
API base URLsEndpoint configurations
Default timeoutsRequest timeout values
Version informationSDK version tracking

Source: src/together/constants.py

Error Handling Setup

The SDK provides a comprehensive error hierarchy for handling API-related issues:

Exception Types

Exception ClassPurpose
TogetherExceptionBase exception class
RateLimitErrorHandle rate limiting
FileTypeErrorFile format validation errors
APIConnectionErrorNetwork connectivity issues
TimeoutRequest timeout handling
AuthenticationErrorInvalid API key errors

Source: src/together/error.py

Error Response Format

API errors are returned with structured information:

class TogetherErrorResponse(BaseModel):
    message: str | None = None      # Error message
    type: str | None = None         # Error type
    param: str | None = None        # Parameter causing error
    code: str | None = None         # Error code

Source: src/together/types/error.py

Error Handling Example

from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"API error: {e}")

Known Compatibility Issues

Typer Version Conflict

Note: The SDK has a dependency constraint on typer<0.16.0. If your project requires typer>=0.16.0, you may encounter dependency conflicts. See Issue #348 for tracking.

This is a known community issue where projects depending on newer typer versions cannot use together-python without resolving the conflict.

Pillow Version

Note: The SDK's image processing may have transitive dependency issues with pillow>=11.0.0 when used alongside libraries like autogen 0.4.2. See Issue #237 for details.

Development Environment Setup

1. Install Poetry

Follow the official Poetry installation guide.

Important: If you use Conda or Pyenv, create and activate a new environment before installing Poetry:
```bash
conda create -n together python=3.10
conda activate together
```

2. Configure Poetry

Tell Poetry to use the active Python environment:

poetry config virtualenvs.prefer-active-python true

3. Install Dependencies

poetry install --with quality,tests

4. Set Up Pre-commit Hooks

The project uses pre-commit for auto-formatting and linting:

pre-commit install

Source: CONTRIBUTING.md

Running Tests

#### Unit Tests

make tests

#### Integration Tests

Warning: Integration tests require an active API key and will incur charges.
make integration_tests

Source: CONTRIBUTING.md

Formatting and Linting

Before submitting changes, run formatting locally:

make format

The CI system automatically checks formatting, linting, and tests.

Source: CONTRIBUTING.md

Quick Start Checklist

StepTaskCommand/Action
1Check Python versionpython --version (requires 3.10+)
2Install SDKpip install together
3Set API keyexport TOGETHER_API_KEY=xxxxx
4Verify installationpython -c "from together import Together; print('OK')"
5Test basic callRun a simple chat completion

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Client Architecture

Related topics: Chat Completions, Type System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Together Client Class

Continue reading this section for the full explanation and source context.

Section API Requestor

Continue reading this section for the full explanation and source context.

Section Resource Modules

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Type System

Client Architecture

Overview

The Together Python SDK provides a unified interface for interacting with the Together AI platform through both a programmatic Python client and a command-line interface (CLI). The client architecture follows a layered design pattern that separates concerns between API communication, resource management, and user-facing interfaces.

The architecture is designed to support multiple API capabilities including chat completions, text completions, embeddings, image generation, file management, and fine-tuning operations. Source: src/together/resources/__init__.py

Core Components

The SDK architecture consists of three primary layers that work together to provide a seamless developer experience:

graph TD
    A[User Application] --> B[Together Client]
    B --> C[Resource Modules]
    C --> D[API Requestor]
    D --> E[Together AI API]
    E --> D
    D --> B
    B --> A
    
    F[CLI Commands] --> B
    
    subgraph Resources
        G[Chat Completions]
        H[Completions]
        I[Embeddings]
        J[Images]
        K[Files]
        L[Fine-tuning]
    end
    
    C --> G
    C --> H
    C --> I
    C --> J
    C --> K
    C --> L

Together Client Class

The Together class serves as the main entry point for the SDK. It provides a synchronous interface for all API operations and manages the underlying HTTP client configuration.

Key Responsibilities:

  • Initialization and configuration of API credentials
  • Delegation of requests to appropriate resource modules
  • Streaming response handling
  • Timeout and connection management

Source: src/together/client.py

Basic Initialization:

from together import Together

# Using environment variable (TOGETHER_API_KEY)
client = Together()

# Explicit API key
client = Together(api_key="your-api-key-here")

API Requestor

The APIRequestor class handles the low-level communication with the Together AI API. It abstracts away HTTP details and provides a consistent interface for both synchronous and asynchronous operations.

Requestor Responsibilities:

  • Constructing HTTP requests with proper authentication headers
  • Handling request serialization and response parsing
  • Managing streaming responses
  • Implementing retry logic for transient failures
  • Processing error responses into typed exceptions

Source: src/together/abstract/api_requestor.py

Resource Modules

Resource modules encapsulate API operations by domain. Each resource module provides type-safe methods for a specific category of API endpoints.

Resource ModulePurposeKey Methods
chat.completionsChat-based language model interactionscreate(), streaming variants
completionsText completion operationscreate(), streaming variants
embeddingsText embedding generationcreate()
imagesImage generationgenerate()
filesFile upload, retrieval, and managementupload(), retrieve(), list(), delete()
fine_tuningModel fine-tuning operationscreate(), retrieve(), list(), cancel(), download()

Source: src/together/resources/__init__.py

Request/Response Flow

Synchronous Request Flow

sequenceDiagram
    participant App as Application Code
    participant Client as Together Client
    participant Resource as Resource Module
    participant Requestor as API Requestor
    participant API as Together AI API

    App->>Client: client.chat.completions.create(...)
    Client->>Resource: delegating request
    Resource->>Resource: build request parameters
    Resource->>Requestor: request()
    Requestor->>API: POST /chat/completions
    API-->>Requestor: JSON Response
    Requestor->>Resource: parse response
    Resource-->>Client: typed response object
    Client-->>App: ChatCompletionResponse

Streaming Response Handling

The SDK supports server-sent events (SSE) streaming for real-time token delivery. Streaming is handled differently depending on the API endpoint:

Chat Completions Streaming:

from together import Together

client = Together()
stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Source: src/together/resources/chat/completions.py

The streaming implementation yields ChatCompletionChunk objects asynchronously when iterating over the response stream.

Asynchronous Support

The SDK provides AsyncTogether for applications requiring concurrent API operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def concurrent_requests():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Prompt {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    return responses

Error Handling

The SDK defines a hierarchy of exception types for different error conditions, enabling precise error handling in application code.

Exception Hierarchy

TogetherException (base)
├── RateLimitError
├── FileTypeError
├── AttributeError
├── Timeout
├── APIConnectionError

Source: src/together/error.py

Error Response Model

API error responses are parsed into structured TogetherErrorResponse objects:

FieldTypeDescription
message`str \None`Human-readable error message
type`str \None`Error category/type
param`str \None`Parameter that caused the error
code`str \None`Machine-readable error code

Source: src/together/types/error.py

Error Handling Example

from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
except TogetherException as e:
    print(f"API error: {e.message}")

File Validation Architecture

The SDK includes robust file validation for fine-tuning datasets, ensuring data integrity before upload.

graph LR
    A[Input File] --> B{File Type Check}
    B -->|JSONL| C[JSONL Validator]
    B -->|JSON| D[JSON Validator]
    C --> E{Content Validation}
    D --> E
    E --> F[Schema Validation]
    F --> G[Size Limits Check]
    G --> H[Upload Ready]
    E -->|Invalid| I[InvalidFileFormatError]

Validation Rules

The file validation system enforces the following constraints:

RuleLimitDescription
Maximum base64 image size10MBPer image in multimodal datasets
Maximum images per example5Images allowed in a single training example
Required fieldstype, contentFor each message in multimodal format

Source: src/together/utils/files.py

Supported Content Types

TypeDescriptionRole Restrictions
textPlain text contentAny role
image_urlBase64-encoded imageUser role only

Fine-tuning Architecture

The fine-tuning module provides comprehensive support for training custom models on the Together platform.

Training Methods

The SDK supports multiple fine-tuning methodologies:

MethodDescriptionCheckpoint Types
Full trainingUpdates all model weightsDefault only
LoRALow-rank adaptationDefault, Merged, Adapter
DPODirect Preference OptimizationDefault
SimPOSimple Preference OptimizationDefault
RPOReward Preference OptimizationDefault

Source: src/together/resources/finetune.py

Checkpoint Management

The fine-tuning resource handles checkpoint retrieval and download:

# List available checkpoints
checkpoints = client.fine_tuning.retrieve_checkpoints(fine_tune_id)

# Download specific checkpoint
result = client.fine_tuning.download(
    fine_tune_id,
    output="./checkpoints",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED
)

CLI Architecture

The command-line interface is built using Click and mirrors the Python client functionality.

CLI Command Structure

together
├── chat completions
├── completions
├── embeddings
├── files
│   ├── check
│   ├── upload
│   ├── list
│   ├── retrieve
│   └── delete
├── fine-tuning
│   ├── create
│   ├── list
│   ├── retrieve
│   ├── cancel
│   ├── download
│   └── delete
└── models
    ├── list
    └── start

Source: src/together/cli/api/chat.py and src/together/cli/api/finetune.py

CLI Configuration

The CLI supports environment variable configuration:

# Set API key
export TOGETHER_API_KEY=your-api-key

# Use CLI
together chat completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, world!"

Known Issues and Limitations

Dependency Compatibility

Issue #348: The SDK has a dependency constraint on typer<0.16.0, which may conflict with projects requiring newer versions of typer. This can cause dependency resolution failures in environments where multiple packages have conflicting typer requirements.

Issue #237: The pillow dependency version may conflict with transitive dependencies from other packages like autogen>=0.4.2 that require pillow>=11.0.0.

Model Type Validation

Issue #337: The ModelObject type definition may not include all valid model types, potentially causing Pydantic validation errors when working with newer or specialized model types like transcription models.

Tool Response Handling

Issue #113: Multi-turn function calling workflows may encounter validation errors when processing tool response messages with role='tool'. Applications implementing function calling should ensure proper message formatting according to the Together AI API specification.

Best Practices

Connection Management

  • Reuse the Together client instance across multiple requests to benefit from connection pooling
  • Set appropriate timeout values for long-running operations like fine-tuning

Error Recovery

  • Implement exponential backoff for RateLimitError handling
  • Validate file contents locally before upload to avoid wasted API calls

Streaming Performance

  • Process streaming chunks incrementally rather than buffering entire responses
  • Use async variants (AsyncTogether) for applications making multiple concurrent requests

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Type System

Related topics: Client Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Abstract Base Model

Continue reading this section for the full explanation and source context.

Section Error Response Model

Continue reading this section for the full explanation and source context.

Section Exception Types

Continue reading this section for the full explanation and source context.

Related topics: Client Architecture

Type System

The together-python SDK employs a comprehensive type system built on Pydantic for data validation, serialization, and API interaction. This document provides a detailed reference for developers working with the SDK's type definitions, error handling, and validation patterns.

Overview

The type system serves three primary purposes within the together-python SDK:

  1. Data Validation: Ensures API request parameters meet expected formats before transmission
  2. Serialization: Converts Python objects to JSON for API communication and deserializes responses
  3. IDE Support: Provides type hints for better developer experience and autocomplete
graph TD
    A[User Code] --> B[Pydantic Models]
    B --> C{Validation}
    C -->|Pass| D[API Request]
    C -->|Fail| E[Validation Error]
    D --> F[API Response]
    F --> G[Response Models]
    G --> H[User Code]

Base Types

Abstract Base Model

All SDK types inherit from BaseModel, which extends Pydantic's BaseModel with custom configuration:

# Source: src/together/types/abstract.py
class BaseModel(BaseModel):
    """Base model for all Together API types."""
    
    model_config = ConfigDict(
        populate_by_name=True,
        validate_default=True,
        arbitrary_types_allowed=True,
    )

The BaseModel configures:

  • populate_by_name=True: Allows population by field name or alias
  • validate_default=True: Validates default values during initialization
  • arbitrary_types_allowed=True: Permits custom type annotations

Error Response Model

The TogetherErrorResponse type defines the structure for API error responses:

FieldTypeDescription
message`str \None`Human-readable error message
type`str \None`Error category/type
param`str \None`Parameter that caused the error
code`str \None`Machine-readable error code
# Source: src/together/types/error.py
class TogetherErrorResponse(BaseModel):
    message: str | None = None
    type_: str | None = Field(None, alias="type")
    param: str | None = None
    code: str | None = None

Exception Hierarchy

The SDK defines a hierarchical exception system for granular error handling:

graph TD
    A[TogetherException<br/>Base Exception] --> B[RateLimitError]
    A --> C[FileTypeError]
    A --> D[AttributeError]
    A --> E[Timeout]
    A --> F[APIConnectionError]
    A --> G[InvalidRequestError]
    A --> H[AuthenticationError]
    A --> I[APIResponseError]

Exception Types

Exception ClassPurposeCommon Cause
TogetherExceptionBase exception for all SDK errorsGeneral failures
RateLimitErrorAPI rate limit exceededToo many requests
FileTypeErrorInvalid file type submittedUnsupported file format
AttributeErrorInvalid attribute accessMissing or invalid parameter
TimeoutRequest timeoutSlow network or API
APIConnectionErrorNetwork connectivity issueConnection failure
InvalidRequestErrorMalformed requestInvalid parameters
AuthenticationErrorAuthentication failureInvalid API key
APIResponseErrorUnexpected API responseServer-side error
# Source: src/together/error.py
class RateLimitError(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)

Exception Construction Pattern

All exception types accept flexible message parameters:

# Source: src/together/error.py
class Timeout(TogetherException):
    def __init__(
        self,
        message: (
            TogetherErrorResponse | Exception | str | RequestException | None
        ) = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(message=message, **kwargs)

The message can be:

  • TogetherErrorResponse: Parsed API error response
  • Exception: Wrapped exception
  • str: Direct error message
  • RequestException: HTTP request exception

Request and Response Types

Chat Completions Types

The chat completions system uses structured types for requests and responses:

# Source: src/together/resources/chat/completions.py
response, _, _ = await requestor.arequest(
    options=TogetherRequest(
        method="POST",
        url="chat/completions",
        params=parameter_payload,
    ),
    stream=stream,
)

if stream:
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)

Streaming Response Types

Streaming responses yield ChatCompletionChunk objects:

FieldTypeDescription
choicesList[Choice]Generated completions
modelstrModel identifier
idstrRequest identifier
usageUsageToken usage statistics

File Validation Types

Content Item Types

The SDK validates file content for fine-tuning datasets:

# Source: src/together/utils/files.py
if item["type"] == "text":
    if "text" not in item or not isinstance(item["text"], str):
        raise InvalidFileFormatError(
            "The dataset is malformed, the `text` field must be present in the `content` item field and be"
            f" a string. Got '{item.get('text')!r}' instead.",
            line_number=idx + 1,
            error_source="key_value",
        )
elif item["type"] == "image_url":
    if role != "user":
        raise InvalidFileFormatError(
            "The dataset is malformed, only user messages can contain images.",
            line_number=idx + 1,
            error_source="key_value",
        )

Content Type Enumeration

TypeValid ContextDescription
textAny rolePlain text content
image_urlUser role onlyImage URL reference

Common Issues and Troubleshooting

Validation Errors

Pydantic validation errors occur when request data doesn't match expected types:

pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelObject
type
  Input should be 'chat', 'language', 'code', 'image', 'embedding',...

Resolution: Ensure model names are valid and match available models in the Together ecosystem. Use client.models.list() to verify available models.

Invalid File Format Errors

When uploading fine-tuning datasets, content validation enforces strict rules:

# Source: src/together/utils/files.py
if not isinstance(item, dict):
    raise InvalidFileFormatError(
        "The dataset is malformed, the `content` field must be a list of dicts.",
        line_number=idx + 1,
        error_source="key_value",
    )

Type Mismatch in Streaming

When processing streaming responses, type assertions ensure correct handling:

# Source: src/together/cli/api/completions.py
if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices

Type Annotations in CLI

The CLI uses Click decorators with type annotations for command-line argument validation:

# Source: src/together/cli/api/chat.py
@click.option(
    "--max-tokens",
    type=int,
    help="Max tokens to generate"
)
@click.option(
    "--temperature",
    type=float,
    help="Sampling temperature"
)
@click.option(
    "--stop",
    type=str,
    multiple=True,
    help="List of strings to stop generation"
)

CLI Type Conversion

CLI Option TypePython TypeNotes
type=intintInteger values
type=floatfloatDecimal values
type=strstrString values
multiple=TruetupleMultiple values
is_flag=TrueboolBoolean flags

Async Type Handling

The SDK provides async variants of response types:

# Source: src/together/resources/chat/completions.py
if stream:
    assert not isinstance(response, TogetherResponse)
    return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)

Best Practices

Type Safety Guidelines

  1. Use Response Models: Always use SDK response models instead of raw dictionaries
  2. Validate Early: Check input types before API calls
  3. Handle Exceptions: Catch specific exception types for targeted error handling
  4. Use Type Hints: Enable IDE autocomplete with proper imports

Importing Types

from together.types.error import TogetherErrorResponse
from together.error import (
    TogetherException,
    RateLimitError,
    InvalidRequestError,
    Timeout,
)

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Chat Completions

Related topics: Completions API, Client Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Environment Configuration

Continue reading this section for the full explanation and source context.

Section Client Initialization

Continue reading this section for the full explanation and source context.

Section Method Signature

Continue reading this section for the full explanation and source context.

Related topics: Completions API, Client Architecture

Chat Completions

The Chat Completions API provides a unified interface for interacting with large language models on the Together platform through conversational message-based interactions. This feature supports text-only and multimodal inputs, streaming responses, function calling, and various generation parameters to control model behavior.

Overview

The Chat Completions resource is the primary interface for conversational AI interactions in the together-python SDK. It follows the OpenAI-compatible chat completions format, enabling developers to switch between providers with minimal code changes while leveraging Together's distributed inference infrastructure.

graph TD
    A[Client Application] --> B[Together Client]
    B --> C[Chat Completions.create]
    C --> D[API Requestor]
    D --> E[Together API]
    E --> F[Model Inference]
    F --> G[Response]
    G --> D
    D --> B
    B --> H[ChatCompletionResponse]
    
    style A fill:#e1f5fe
    style H fill:#c8e6c9

Key capabilities include:

  • Text Completions: Standard conversational text generation with system, user, and assistant roles
  • Multimodal Input: Support for images alongside text in user messages
  • Streaming: Real-time token-by-token response streaming
  • Function Calling: Tool-use with structured function definitions and responses
  • Safety Controls: Built-in moderation model integration
  • Audio Support: Attach audio URLs to messages for Whisper-transcribed context

Source: src/together/resources/chat/completions.py:1-50

Installation and Setup

Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.

export TOGETHER_API_KEY=your_api_key_here

Client Initialization

from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="your_api_key_here")

# Custom base URL for testing
client = Together(
    api_key="your_api_key_here",
    base_url="https://api.together.xyz"
)

Source: README.md

API Reference

Method Signature

ChatCompletions.create(
    model: str,
    messages: List[ChatCompletionMessageParam],
    frequency_penalty: Optional[float] = None,
    max_tokens: Optional[int] = None,
    n: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    stop: Optional[Union[str, List[str]]] = None,
    stream: Optional[bool] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    top_k: Optional[int] = None,
    min_p: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    logprobs: Optional[int] = None,
    echo: Optional[bool] = None,
    safety_model: Optional[str] = None,
    response_format: Optional[ResponseFormat] = None,
    tools: Optional[List[ChatCompletionToolParam]] = None,
    tool_choice: Optional[Union[ChatCompletionToolChoiceEnum, ChatCompletionNamedToolChoiceParam]] = None,
    audio: Optional[ChatCompletionAudioParam] = None,
    max_completion_tokens: Optional[int] = None,
) -> ChatCompletionResponse

Source: src/together/resources/chat/completions.py:1-50

Parameters

ParameterTypeRequiredDefaultDescription
modelstrYes-Model identifier (e.g., meta-llama/Llama-4-Scout-17B-16E-Instruct)
messagesList[ChatCompletionMessageParam]Yes-List of conversation messages with roles
temperaturefloatNo0.7Sampling temperature (0.0-2.0)
top_pfloatNo1.0Nucleus sampling threshold
top_kintNo-Top-k token selection
min_pfloatNo-Minimum probability threshold
max_tokensintNo256Maximum tokens to generate
max_completion_tokensintNo-Alternative to max_tokens
streamboolNoFalseEnable streaming response
stopstr or List[str]No-Stop sequences
nintNo1Number of completions to generate
presence_penaltyfloatNo0.0Penalize repeated tokens
frequency_penaltyfloatNo0.0Penalize frequent tokens
repetition_penaltyfloatNo1.0Token repetition penalty
logprobsintNo-Return log probabilities
echoboolNoFalseEcho prompt in response
safety_modelstrNo-Moderation model identifier
response_formatResponseFormatNo-Constrain output format (JSON schema)
toolsList[ChatCompletionToolParam]No-Available function definitions
tool_choicestr or dictNo"auto"Tool selection strategy
audioChatCompletionAudioParamNo-Audio parameters for voice input

Message Format

Message Roles

The chat completions API supports structured conversation turns through a role-based message system:

RoleDescriptionContent Type
systemInstructions and contextText only
userHuman inputText, images, or mixed
assistantModel responsesText and tool calls
toolFunction execution resultsText (JSON)
developerDeveloper instructionsText only

Message Structure

from together import Together
from together.types.chat.chat_completion_message_param import ChatCompletionMessageParam

client = Together()

messages: List[ChatCompletionMessageParam] = [
    {
        "role": "system",
        "content": "You are a helpful coding assistant."
    },
    {
        "role": "user", 
        "content": "Write a Python function to calculate factorial."
    }
]

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages
)

print(response.choices[0].message.content)

Source: src/together/resources/chat/completions.py:1-50

Multimodal Messages

User messages can include both text and images using a content array:

response = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.png"
                }
            }
        ]
    }]
)

Image URL content items must follow specific validation rules. The image_url field must be a dictionary containing a url key with a valid URL string. Images are only permitted in user role messages.

Source: src/together/utils/files.py:1-50

Streaming Responses

The API supports server-sent events (SSE) streaming for real-time token generation:

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming Architecture

sequenceDiagram
    participant Client
    participant APIRequestor
    participant TogetherAPI
    participant Model
    
    Client->>APIRequestor: create(stream=True)
    APIRequestor->>TogetherAPI: POST /chat/completions
    TogetherAPI->>Model: Start inference
    Model-->>TogetherAPI: Token 1
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Model-->>TogetherAPI: Token 2
    TogetherAPI-->>APIRequestor: SSE: data: {...}
    APIRequestor-->>Client: ChatCompletionChunk
    Note over Model,Client: Streaming continues...
    Model-->>TogetherAPI: [DONE]
    TogetherAPI-->>APIRequestor: [DONE]
    APIRequestor-->>Client: Iterator ends

When streaming is enabled, the method returns an async generator that yields ChatCompletionChunk objects. Each chunk contains incremental deltas that should be accumulated to reconstruct the complete response.

Source: src/together/resources/chat/completions.py:40-80

Function Calling

Function calling enables models to invoke predefined tools with structured outputs. This follows the OpenAI function calling schema.

Defining Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

Tool Execution Flow

graph TD
    A[User Query] --> B[Create with tools]
    B --> C{Model selects tool?}
    C -->|Yes| D[Return tool_call]
    C -->|No| E[Return text response]
    D --> F[Execute function]
    F --> G[tool role message]
    G --> H[Continue with messages]
    H --> B
    E --> I[Final Response]
    
    style D fill:#fff3e0
    style G fill:#e8f5e9

Multi-turn Conversation

After receiving a function call, append the assistant's tool call message and the tool response:

# Initial request with tools
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

assistant_msg = response.choices[0].message
print(f"Tool called: {assistant_msg.tool_calls[0].function.name}")
print(f"Arguments: {assistant_msg.tool_calls[0].function.arguments}")

# Simulate tool execution
tool_result = {"temperature": 22, "conditions": "Sunny"}

# Continue conversation with tool response
messages = [
    {"role": "user", "content": "What's the weather in Paris?"},
    assistant_msg,
    {
        "role": "tool",
        "tool_call_id": assistant_msg.tool_calls[0].id,
        "content": json.dumps(tool_result)
    }
]

final_response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages,
    tools=tools
)
Note: There is a known issue (#113) where tool/function response messages with role='tool' may encounter validation errors. Ensure the tool_call_id matches exactly and the content is valid JSON.

Source: src/together/resources/chat/completions.py:20-60

CLI Interface

The Together CLI provides command-line access to chat completions:

# Basic chat completion
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Hello, how are you?"

# Streaming response
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Write a story"

# With temperature control
together chat.completions \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --message "Explain physics" \
    --temperature 0.8

CLI Options

OptionTypeDescription
--message(str, str) multipleMessage as role-content tuple
--modelstrModel identifier (required)
--max-tokensintMaximum tokens to generate
--temperaturefloatSampling temperature
--top-pintNucleus sampling
--top-kfloatTop-k sampling
--stopstr multipleStop sequences
--repetition-penaltyfloatRepetition penalty
--presence-penaltyfloatPresence penalty
--frequency-penaltyfloatFrequency penalty
--min-pfloatMinimum p sampling
--no-streamflagDisable streaming
--safety-modelstrModeration model
--rawflagReturn raw JSON

Source: src/together/cli/api/chat.py:1-100

Error Handling

The SDK provides structured exception types for different error conditions:

from together import Together
from together.error import (
    TogetherException,
    RateLimitError,
    APIConnectionError,
    Timeout,
    AuthenticationError
)

client = Together()

try:
    response = client.chat.completions.create(
        model="invalid-model-name",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limited: {e}")
except Timeout as e:
    print(f"Request timed out: {e}")
except TogetherException as e:
    print(f"API error: {e}")

Exception Hierarchy

classDiagram
    class TogetherException {
        +message
    }
    class RateLimitError {
        +message
    }
    class APIConnectionError {
        +message
    }
    class Timeout {
        +message
    }
    class AuthenticationError {
        +message
    }
    class FileTypeError {
        +message
    }
    
    TogetherException <|-- RateLimitError
    TogetherException <|-- APIConnectionError
    TogetherException <|-- Timeout
    TogetherException <|-- FileTypeError

Common Error Codes

Error TypeCauseResolution
400 Bad RequestInvalid parametersCheck message format, model name
401 UnauthorizedInvalid API keyVerify TOGETHER_API_KEY
429 Too Many RequestsRate limit exceededImplement exponential backoff
500 Internal ErrorServer errorRetry with backoff
504 Gateway TimeoutRequest timeoutIncrease timeout or retry

Source: src/together/error.py:1-80

Response Format

Standard Response

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{"role": "user", "content": "Hello"}]
)

# Access response attributes
print(response.id)           # chatcmpl-xxx
print(response.model)       # meta-llama/Llama-4-Scout-17B-16E-Instruct
print(response.choices[0].message.content)  # Response text
print(response.usage.prompt_tokens)         # Input tokens
print(response.usage.completion_tokens)    # Output tokens
print(response.usage.total_tokens)         # Total tokens

Streaming Chunk

for chunk in stream:
    # ChatCompletionChunk structure
    print(chunk.id)              # Same ID as final response
    print(chunk.choices[0].delta.content)  # Incremental content
    print(chunk.choices[0].finish_reason)  # 'stop' or 'length'

Async Usage

The SDK provides async variants for concurrent operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def multi_chat():
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[{"role": "user", "content": f"Query {i}"}]
        )
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    
    for response in responses:
        print(response.choices[0].message.content)

asyncio.run(multi_chat())

Retry Logic and Timeouts

The API requestor implements automatic retry with exponential backoff:

from together.constants import (
    MAX_RETRIES,
    INITIAL_RETRY_DELAY,
    MAX_RETRY_DELAY,
    TIMEOUT_SECS
)

# Default configuration
# MAX_RETRIES: 10
# INITIAL_RETRY_DELAY: 0.5 seconds
# MAX_RETRY_DELAY: 120 seconds
# TIMEOUT_SECS: 600 seconds

# Custom configuration
client = Together(
    max_retries=5,
    timeout=300
)

The retry strategy handles:

  • Connection timeouts
  • 5xx server errors
  • Rate limit responses (429)

Source: src/together/abstract/api_requestor.py:1-100

Known Limitations

IssueDescriptionWorkaround
typer version conflictSDK requires typer<0.16.0Use virtual environments
Model type validationSome model types not recognizedUse model names directly
Tool response formatrole='tool' messages may fail validationEnsure proper tool_call_id and JSON content

For the most current issues and workarounds, refer to the GitHub Issues.

Best Practices

  1. Token Management: Always set max_tokens to prevent runaway generation
  2. Error Handling: Wrap API calls in try-except blocks with appropriate exception handling
  3. Streaming: Use streaming for better perceived latency on long responses
  4. Context Management: Keep message lists manageable; trim old messages when对话 exceeds model context
  5. Safety: Enable safety_model for user-facing applications

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Completions API

Related topics: Chat Completions, Embeddings and Reranking

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Environment Configuration

Continue reading this section for the full explanation and source context.

Section Client Initialization

Continue reading this section for the full explanation and source context.

Section Synchronous Completion

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Embeddings and Reranking

Completions API

The Completions API provides access to language model text completion endpoints in the Together AI platform. This API enables developers to generate text completions from various open-source models hosted on Together AI, supporting use cases ranging from code generation to creative writing.

Overview

The Together Python SDK provides two primary APIs for text generation:

  1. Completions API - Designed for legacy text completion models and prompt-based generation
  2. Chat Completions API - Optimized for modern chat-based models with structured message formats

Both APIs support synchronous, asynchronous, and streaming modes of operation.

Source: README.md:1-50

Installation and Setup

Environment Configuration

The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.

export TOGETHER_API_KEY=xxxxx

Client Initialization

from together import Together

# Using environment variable
client = Together()

# Explicit API key
client = Together(api_key="xxxxx")

Source: README.md:10-20

Usage Patterns

Synchronous Completion

The synchronous method blocks until the complete response is received:

from together import Together

client = Together()
response = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    max_tokens=200,
)
print(response.choices[0].text)

Source: README.md:80-90

Streaming Completion

Streaming allows real-time response generation by processing chunks as they arrive:

from together import Together

client = Together()
stream = client.completions.create(
    model="codellama/CodeLlama-34b-Python-hf",
    prompt="Write a Next.js component with TailwindCSS for a header component.",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Source: README.md:92-103

Asynchronous Completion

The async API enables concurrent requests for improved throughput:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()
prompts = [
    "Write a Next.js component with TailwindCSS for a header component.",
    "Write a python function for the fibonacci sequence",
]

async def async_completion(prompts):
    tasks = [
        async_client.completions.create(
            model="codellama/CodeLlama-34b-Python-hf",
            prompt=prompt,
        )
        for prompt in prompts
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].text)

asyncio.run(async_completion(prompts))

Source: README.md:105-125

API Parameters

Core Parameters

ParameterTypeRequiredDescription
modelstringYesModel identifier from the available Together AI models
promptstringYesThe input prompt for text generation
max_tokensintegerNoMaximum number of tokens to generate
temperaturefloatNoSampling temperature (0.0-2.0, default varies by model)
top_pfloatNoNucleus sampling probability threshold
top_kintegerNoTop-k sampling parameter
streambooleanNoEnable streaming response (default: false)
nintegerNoNumber of completions to generate
stopstring/arrayNoStop sequence(s) to end generation
logprobsintegerNoNumber of top log probabilities to return
echobooleanNoEcho the prompt in the response
repetition_penaltyfloatNoPenalty for token repetition
presence_penaltyfloatNoPenalize tokens based on presence
frequency_penaltyfloatNoPenalize tokens based on frequency
min_pfloatNoMinimum probability threshold for sampling
safety_modelstringNoModeration model to use

Source: src/together/cli/api/completions.py:1-50

Parameter Details

#### Sampling Parameters

  • temperature: Controls randomness in generation. Lower values (0.1-0.3) produce more deterministic output, while higher values (0.7-1.0) increase creativity.
  • top_p: Also known as nucleus sampling, controls the cumulative probability mass to consider.
  • top_k: Limits token selection to the top k most probable tokens.

#### Repetition Control

  • repetition_penalty: Values > 1.0 discourage repetition, values < 1.0 encourage it.
  • presence_penalty: Positive values encourage discussing new topics.
  • frequency_penalty: Positive values reduce repetition of high-frequency tokens.

CLI Usage

The SDK includes a command-line interface for completions:

together completions "Your prompt here" --model codellama/CodeLlama-34b-Python-hf

CLI Options

OptionShortDescriptionDefault
--model-mModel nameRequired
--max-tokens-tMax tokens to generateNone
--temperature-TSampling temperatureNone
--top-p-pTop p samplingNone
--top-k-kTop k samplingNone
--stop-sStop sequences (multiple allowed)None
--no-stream-nsDisable streamingFalse
--repetition-penalty-rpRepetition penaltyNone
--presence-penalty-ppPresence penaltyNone
--frequency-penalty-fpFrequency penaltyNone
--min-p-mpMinimum pNone
--logprobs-lReturn log probabilitiesNone
--echo-eEcho prompt in responseFalse
--n-nNumber of generationsNone
--safety-model-smModeration modelNone
--raw-rReturn raw JSON responseFalse

Source: src/together/cli/api/completions.py:1-75

CLI Streaming Output

When streaming is enabled (default), the CLI processes chunks in real-time:

if not no_stream:
    for chunk in response:
        assert isinstance(chunk, CompletionChunk)
        assert chunk.choices

        if raw:
            click.echo(f"{json.dumps(chunk.model_dump(exclude_none=True))}")
            continue

        for stream_choice in sorted(chunk.choices, key=lambda c: c.index):
            assert isinstance(stream_choice, CompletionChoicesChunk)
            assert stream_choice.delta
            click.echo(f"{stream_choice.delta.content}", nl=False)

Source: src/together/cli/api/completions.py:45-65

Response Structure

Completion Response

FieldTypeDescription
idstringUnique identifier for the completion
choicesarrayArray of completion choices
choices[].textstringGenerated text content
choices[].indexintegerChoice index for multiple completions
choices[].finish_reasonstringReason for completion ending
modelstringModel used for generation
usageobjectToken usage statistics

Streaming Chunk Response

FieldTypeDescription
idstringChunk identifier
choicesarrayArray of delta choices
choices[].deltaobjectIncremental text delta
choices[].delta.contentstringDelta text content
choices[].indexintegerChoice index

Architecture

Request Flow

graph TD
    A[Client.completions.create] --> B[Validate Parameters]
    B --> C[APIRequestor]
    C --> D{HTTP Method}
    D -->|POST| E[Send Request to together.ai]
    D -->|Streaming| F[Return Chunk Iterator]
    E --> G[Parse Response]
    G --> H[Return CompletionResponse]
    F --> I[Stream Chunks]
    I --> J[Yield CompletionChunk]

Response Handling

graph TD
    A[API Response] --> B{Streaming Mode?}
    B -->|Yes| C[Return Async Generator]
    B -->|No| D[Return TogetherResponse]
    C --> E[ChatCompletionChunk]
    D --> F[CompletionResponse]

Error Handling

Exception Types

The SDK defines specific exception types for different error conditions:

ExceptionDescription
TogetherExceptionBase exception class
RateLimitErrorAPI rate limit exceeded
APIConnectionErrorNetwork connectivity issues
TimeoutRequest timeout
FileTypeErrorInvalid file type
AttributeErrorInvalid attribute access

Source: src/together/error.py:1-60

Error Response Structure

class TogetherErrorResponse(BaseModel):
    message: str
    type: str
    code: Optional[str] = None
    param: Optional[str] = None

Common Error Scenarios

  1. Rate Limiting: When API rate limits are exceeded, the SDK automatically retries with exponential backoff based on configuration.
  1. Timeout: Configurable timeout with default handling:
# Default timeout is 60 seconds
TIMEOUT_SECS = 60

Source: src/together/abstract/api_requestor.py:20-40

  1. Invalid Model: Returns validation error with available model list

Configuration Options

Client Configuration

ParameterTypeDefaultDescription
api_keystringenv: TOGETHER_API_KEYAPI authentication key
base_urlstringapi.together.aiAPI base URL
timeoutinteger60Request timeout in seconds
max_retriesinteger3Maximum retry attempts

Retry Configuration

MAX_RETRIES = 3
INITIAL_RETRY_DELAY = 0.5  # seconds
MAX_RETRY_DELAY = 2.0  # seconds
MAX_CONNECTION_RETRIES = 2
MAX_SESSION_LIFETIME_SECS = 300

Source: src/together/abstract/api_requestor.py:20-40

Known Limitations and Issues

Dependency Conflicts

The SDK has a dependency on typer<0.16.0, which may cause conflicts with projects requiring newer versions of typer. This is a known issue tracked in #348.

Variable Scope Issue

A known UnboundLocalError issue can occur in certain error scenarios when the result variable is referenced before assignment. This is being tracked in #143.

Best Practices

Efficient Usage

  1. Use Streaming for Long Outputs: When expecting long completions, use streaming to improve perceived latency
  2. Batch Requests with Async: Use AsyncTogether for parallel API calls
  3. Set Appropriate Limits: Configure max_tokens to prevent excessive generation

Production Considerations

  1. Implement Retry Logic: The SDK handles retries, but implement additional logic for critical operations
  2. Monitor Token Usage: Track usage via response usage field
  3. Use Safety Models: Enable moderation for user-facing applications

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Embeddings and Reranking

Related topics: Chat Completions, Files API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Use Cases

Continue reading this section for the full explanation and source context.

Section Python Client Usage

Continue reading this section for the full explanation and source context.

Section Embeddings Response Model

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions, Files API

Embeddings and Reranking

The Together Python SDK provides first-class support for text embeddings and document reranking through dedicated resource classes. These features enable semantic search, document retrieval, and information discovery workflows by converting text into dense vector representations and reordering search results based on relevance.

Overview

Embeddings and reranking are complementary capabilities that power modern retrieval-augmented generation (RAG) and search systems. The SDK exposes these through the embeddings and rerank namespaces on the main Together client, following a consistent pattern with other API resources like chat completions.

graph LR
    A[Text Input] --> B[Embeddings API]
    B --> C[Vector Embeddings]
    C --> D[Reranking API]
    D --> E[Re-ranked Results]
    
    F[Query] --> D
    G[Document Pool] --> D

Key characteristics:

  • Both endpoints use the same Together client instance
  • Responses are returned as Pydantic model objects for type safety
  • Both support synchronous and async patterns via Together and AsyncTogether
  • Input text requires newline normalization for optimal results

Embeddings

Purpose and Use Cases

The Embeddings API converts text into high-dimensional vector representations that capture semantic meaning. These vectors can be stored in vector databases and used for similarity search, clustering, or as features for downstream ML tasks.

Common use cases include:

  • Semantic search systems
  • Document clustering and categorization
  • Recommendation systems
  • Duplicate detection
  • Feature extraction for classification tasks

Python Client Usage

from typing import List
from together import Together

client = Together()

def get_embeddings(texts: List[str], model: str) -> List[List[float]]:
    # Normalize newlines as recommended by the SDK
    texts = [text.replace("\n", " ") for text in texts]
    
    outputs = client.embeddings.create(model=model, input=texts)
    
    # Extract embedding vectors in order
    return [outputs.data[i].embedding for i in range(len(texts))]

# Example usage
input_texts = ["Our solar system orbits the Milky Way galaxy at about 515,000 mph"]
embeddings = get_embeddings(
    input_texts,
    model="togethercomputer/m2-bert-80M-8k-retrieval"
)
print(embeddings)

Embeddings Response Model

The EmbeddingsCreateResponse model provides structured access to API responses:

FieldTypeDescription
objectstrObject type, typically "list"
dataList[Embedding]List of embedding objects
modelstrModel used for embeddings
usageEmbeddingUsageToken usage statistics

Each Embedding object contains:

FieldTypeDescription
objectstrObject type, typically "embedding"
embeddingList[float]The embedding vector
indexintPosition in the input list

The EmbeddingUsage object tracks:

FieldTypeDescription
prompt_tokensintTokens in the input
total_tokensintTotal tokens processed

API Parameters

ParameterTypeRequiredDefaultDescription
modelstrYes-Embedding model identifier
inputUnion[str, List[str]]Yes-Text(s) to embed

Available Embedding Models

The SDK works with embedding models available on the Together platform. Common models include:

  • togethercomputer/m2-bert-80M-8k-retrieval - 8K context, 80M parameters
  • togethercomputer/m2-bert-80M-2k-retrieval - 2K context, 80M parameters

Model availability can be queried using:

models = client.models.list()
# Filter for embedding models

Reranking

Purpose and Use Cases

The Reranking API takes a query and a set of documents, then returns those documents reordered by relevance to the query. This is particularly valuable when combined with embeddings-based retrieval to refine initial search results.

Common use cases include:

  • Improving search result quality after initial embedding-based retrieval
  • Multi-stage retrieval pipelines
  • Reordering candidates from vector similarity search
  • Question answering systems retrieving relevant context

Python Client Usage

from typing import List
from together import Together

client = Together()

def get_reranked_documents(
    query: str, 
    documents: List[str], 
    model: str, 
    top_n: int = 3
) -> List[str]:
    outputs = client.rerank.create(
        model=model,
        query=query,
        documents=documents,
        top_n=top_n
    )
    
    # Sort by relevance score and return original documents
    return [
        documents[i] 
        for i in sorted(
            [x.index for x in outputs.results], 
            key=lambda x: outputs.results[x].relevance_score, 
            reverse=True
        )
    ]

# Example usage
query = "What is the capital of the United States?"
documents = ["New York", "Washington, D.C.", "Los Angeles"]

reranked = get_reranked_documents(query, documents, top_n=3)
print(reranked)  # ["Washington, D.C.", "New York", "Los Angeles"]

Reranking Response Model

The RerankResponse model provides structured access to reranking results:

FieldTypeDescription
idstrRequest identifier
resultsList[Ranking]List of ranked documents
metaRerankMetaMetadata including model and usage
objectstrObject type

Each Ranking object contains:

FieldTypeDescription
indexintOriginal document index
relevance_scorefloatRelevance score (higher = more relevant)
documentDocumentThe document object with text

The Document object:

FieldTypeDescription
textstrDocument text content

The RerankMeta object:

FieldTypeDescription
model_idstrModel used for reranking
usageRerankUsageToken usage statistics

API Parameters

ParameterTypeRequiredDefaultDescription
modelstrYes-Reranking model identifier
querystrYes-The query to rank documents against
documentsList[str]Yes-Documents to be ranked
top_nintNo3Number of top results to return
max_chunks_per_docintNoNoneMax chunks per document (model-dependent)
return_documentsboolNoTrueWhether to include document text in response

Combined Workflow

A typical retrieval pipeline combines embeddings and reranking:

graph TD
    A[User Query] --> B[Embed Query]
    C[Document Corpus] --> D[Embed All Documents]
    B --> E[Vector Similarity Search]
    D --> E
    E --> F[Candidate Documents]
    F --> G[Rerank with Query]
    G --> H[Final Results]
    
    I[Vector Database] <--> D

Complete Example

from typing import List
from together import Together

client = Together()

EMBEDDING_MODEL = "togethercomputer/m2-bert-80M-8k-retrieval"
RERANK_MODEL = "BAAI/bge-reranker"

def semantic_search(
    query: str,
    documents: List[str],
    embedding_model: str = EMBEDDING_MODEL,
    rerank_model: str = RERANK_MODEL,
    top_k: int = 10,
    final_k: int = 3
) -> List[dict]:
    """
    Combined embeddings + reranking search pipeline.
    """
    # Step 1: Embed the query
    query_embedding = client.embeddings.create(
        model=embedding_model,
        input=query.replace("\n", " ")
    ).data[0].embedding
    
    # Step 2: Embed all documents
    doc_embeddings = client.embeddings.create(
        model=embedding_model,
        input=[doc.replace("\n", " ") for doc in documents]
    )
    
    # Step 3: Simple cosine similarity (for demonstration)
    # In production, use a proper vector database
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings.data):
        similarity = sum(q * d for q, d in zip(query_embedding, doc_emb.embedding))
        similarities.append((i, similarity))
    
    # Sort by similarity and take top_k
    similarities.sort(key=lambda x: x[1], reverse=True)
    candidate_indices = [idx for idx, _ in similarities[:top_k]]
    candidate_docs = [documents[i] for i in candidate_indices]
    
    # Step 4: Rerank candidates
    rerank_results = client.rerank.create(
        model=rerank_model,
        query=query,
        documents=candidate_docs,
        top_n=final_k
    )
    
    # Step 5: Extract final results with scores
    results = []
    for result in rerank_results.results:
        results.append({
            "document": result.document.text,
            "relevance_score": result.relevance_score,
            "original_index": result.index
        })
    
    return results

# Usage
query = "machine learning optimization techniques"
corpus = [
    "Gradient descent is a first-order iterative optimization algorithm.",
    "The capital of France is Paris.",
    "Stochastic gradient descent uses random subsets of data.",
    "Climate change affects global weather patterns.",
    "Adam optimizer combines momentum and RMSprop concepts."
]

results = semantic_search(query, corpus)
for r in results:
    print(f"Score: {r['relevance_score']:.4f} - {r['document']}")

Async Usage

Both embeddings and reranking support asynchronous operations:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def async_embeddings():
    tasks = [
        async_client.embeddings.create(
            model="togethercomputer/m2-bert-80M-8k-retrieval",
            input=texts
        )
        for texts in batched_documents
    ]
    results = await asyncio.gather(*tasks)
    return results

async def async_rerank():
    return await async_client.rerank.create(
        model="BAAI/bge-reranker",
        query="What is deep learning?",
        documents=["Doc 1", "Doc 2", "Doc 3"],
        top_n=3
    )

# Run
asyncio.run(async_embeddings())
asyncio.run(async_rerank())

CLI Support

The CLI provides commands for embeddings and reranking operations:

# Embeddings via CLI (using completions with embeddings model)
together completions \
  "Our solar system orbits the Milky Way galaxy" \
  --model togethercomputer/m2-bert-80M-8k-retrieval

Note: Direct CLI commands for embeddings may require specific model configurations. For full reranking CLI support, use the Python API.

Error Handling

Both resources can raise standard Together exceptions defined in src/together/error.py:

Error TypeDescription
TogetherExceptionBase exception class
RateLimitErrorAPI rate limit exceeded
APIConnectionErrorNetwork connectivity issues
from together import Together
from together.error import TogetherException, RateLimitError

client = Together()

try:
    response = client.embeddings.create(
        model="togethercomputer/m2-bert-80M-8k-retrieval",
        input="Sample text"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except TogetherException as e:
    print(f"API error: {e}")

Input Text Normalization

The SDK documentation recommends normalizing newline characters in input text:

# Recommended: normalize input text
normalized_texts = [text.replace("\n", " ") for text in texts]

# Create embeddings
response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=normalized_texts
)

This normalization helps ensure consistent embedding quality across varied text inputs.

Known Limitations

Based on community feedback and issue tracking:

  1. Model availability: Embedding and reranking model availability may vary. Always verify model identifiers against the Together model marketplace.
  1. Batch sizes: Large batches of documents may require multiple API calls. Consider batching strategies for large document collections.
  1. Token limits: Both APIs have token limits that may restrict single-request document counts. Monitor usage fields in responses.

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Image Generation

Related topics: Chat Completions

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section Client Method: client.images.generate()

Continue reading this section for the full explanation and source context.

Related topics: Chat Completions

Image Generation

The Image Generation module in together-python provides programmatic access to Together AI's image synthesis API, enabling developers to generate images from text prompts using state-of-the-art diffusion models. This module supports both synchronous and asynchronous requests, includes a comprehensive CLI interface, and returns images in multiple formats suitable for various downstream applications.

Overview

The Image Generation feature is part of the Together AI Python SDK that abstracts the complexity of API communication and response parsing. It allows developers to:

  • Generate images from text prompts using supported diffusion models
  • Configure generation parameters such as dimensions, steps, and seed
  • Support negative prompts to guide generation away from unwanted elements
  • Return images as Base64-encoded data or URLs
  • Integrate seamlessly with other SDK features like chat completions and embeddings

Image generation is accessed through the client.images namespace in the main Together client, following a consistent pattern used throughout the SDK. Source: src/together/resources/images.py:1-50

Architecture

Component Overview

The image generation system consists of several interconnected components that work together to provide a unified interface:

graph TD
    A[User Code] --> B[Together Client]
    B --> C[Images Resource]
    C --> D[APIRequestor]
    D --> E[Together API]
    E --> F[ImageResponse]
    F --> G[User Code]
    
    H[CLI Command] --> C
    I[ImageCLI] --> B
    
    J[ImageRequest Type] --> C
    K[ImageResponse Type] --> F

Module Structure

ComponentFile PathPurpose
Images Resourcesrc/together/resources/images.pyMain API client for image generation
Image Typessrc/together/types/images.pyPydantic models for request/response validation
CLI Modulesrc/together/cli/api/images.pyCommand-line interface for image generation
File Utilssrc/together/utils/files.pyHelper utilities for file operations

Source: src/together/resources/images.py

API Reference

Client Method: `client.images.generate()`

The primary method for generating images. Supports both synchronous and asynchronous operation modes.

Signature:

async def generate(
    self,
    prompt: str,
    model: str,
    *,
    seed: Optional[int] = None,
    n: int = 1,
    height: int = 1024,
    width: int = 1024,
    negative_prompt: Optional[str] = None,
    **kwargs,
) -> ImageResponse

#### Parameters

ParameterTypeDefaultDescription
promptstrRequiredText description of the desired image
modelstrRequiredModel identifier (e.g., stabilityai/stable-diffusion-xl-base-1.0)
seedintNoneRandom seed for reproducible generation
nint1Number of images to generate
heightint1024Output image height in pixels
widthint1024Output image width in pixels
negative_promptstrNonePrompt describing elements to avoid
**kwargsAnyN/AAdditional model-specific parameters

Source: src/together/resources/images.py:36-60

#### Returns

FieldTypeDescription
dataList[ImageChoicesData]List of generated image objects
data[0].b64_jsonstrBase64-encoded PNG image data
data[0].urlstrRemote URL to the generated image (if available)
data[0].revised_promptstrPrompt revised by the model's safety filter

Source: src/together/types/images.py

Response Type: `ImageResponse`

The ImageResponse object wraps the API response with additional metadata:

class ImageResponse(TogetherBaseResponse):
    data: List[ImageChoicesData]

Source: src/together/types/images.py

Usage Patterns

Basic Synchronous Usage

from together import Together

client = Together()

response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=4,
)

# Access base64-encoded images
for image_data in response.data:
    print(image_data.b64_json)

# Access revised prompt (if modified by safety filter)
for image_data in response.data:
    if image_data.revised_prompt:
        print(f"Revised prompt: {image_data.revised_prompt}")

Source: README.md

Using Seed for Reproducibility

from together import Together

client = Together()

# Generate with a fixed seed for reproducible results
response = client.images.generate(
    prompt="a serene mountain landscape at sunset",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    seed=42,
    n=1,
    height=768,
    width=768,
)

Multiple Images in Single Request

from together import Together

client = Together()

response = client.images.generate(
    prompt="a bowl of fresh fruit",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Generate 4 variations
    width=512,
    height=512,
)

# Process each generated image
for idx, image_data in enumerate(response.data):
    # Save each image to disk
    import base64
    image_bytes = base64.b64decode(image_data.b64_json)
    with open(f"generated_image_{idx}.png", "wb") as f:
        f.write(image_bytes)

CLI Interface

The Together CLI provides a convenient interface for image generation without writing Python code.

Command Structure

together images generate "prompt text" --model <MODEL_NAME> [OPTIONS]

Source: src/together/cli/api/images.py:1-30

CLI Options

OptionTypeDefaultDescription
--modelstrRequiredModel name to use for generation
--stepsint20Number of diffusion steps
--seedintNoneRandom seed for reproducibility
--nint1Number of images to generate
--heightint1024Image height in pixels
--widthint1024Image width in pixels
--negative-promptstrNoneElements to avoid in generation
--outputpath.Output directory for generated images
--prefixstrimage-Filename prefix for saved images
--no-showflagFalseDo not open images in viewer

Source: src/together/cli/api/images.py:31-70

CLI Usage Examples

Basic image generation:

together images generate "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --n 4

Custom dimensions with reproducible seed:

together images generate "mountain landscape" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --seed 12345 \
  --width 512 \
  --height 768 \
  --steps 30

Save to specific directory without viewing:

together images generate "abstract art" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output ./generated_images \
  --prefix "artwork-" \
  --no-show

Image Display Behavior

By default, the CLI automatically opens generated images in the system's default image viewer using the PIL.Image library. This behavior can be disabled with the --no-show flag. Source: src/together/cli/api/images.py:70-90

Supported Models

The Together AI platform supports various image generation models. The SDK allows any compatible model identifier to be passed directly:

Model FamilyExample Model IdentifierTypical Use
Stable Diffusion XLstabilityai/stable-diffusion-xl-base-1.0General purpose generation
Fluxblack-forest-labs/FLUX.1-devHigh-quality artistic generation
Playgroundplaygroundai/playground-v2.5Versatile creative work

Source: README.md

To list all available image generation models programmatically:

from together import Together

client = Together()
models = client.models.list()

# Filter for image models
image_models = [m for m in models.data if m.type == "image"]
for model in image_models:
    print(f"{model.display_name}: {model.name}")

Request/Response Flow

sequenceDiagram
    participant User
    participant Client
    participant ImagesResource
    participant APIRequestor
    participant TogetherAPI
    participant ImageResponse
    
    User->>Client: client.images.generate(...)
    Client->>ImagesResource: generate(prompt, model, ...)
    ImagesResource->>ImageRequest: Create ImageRequest
    ImagesResource->>APIRequestor: arequest(POST /images/generations)
    APIRequestor->>TogetherAPI: HTTP POST Request
    TogetherAPI-->>APIRequestor: JSON Response
    APIRequestor-->>ImagesResource: TogetherResponse
    ImagesResource->>ImageResponse: Parse response data
    ImageResponse-->>User: ImageResponse with image data
    
    Note over User,TogetherAPI: Base64 images available in response.data[].b64_json

Source: src/together/resources/images.py:40-70

Common Issues and Troubleshooting

Pillow Version Compatibility

Some users have reported transitive dependency conflicts with the pillow library. The SDK depends on specific pillow versions for image handling and display features in the CLI. If you encounter conflicts with other packages requiring newer pillow versions, consider using separate virtual environments. Source: GitHub Issue #237

Large Image Base64 Handling

Generated images are returned as Base64-encoded strings in b64_json field. When processing large images or multiple images, ensure your application has sufficient memory available. The SDK does not impose a maximum size limit, but the Together API limits images to approximately 10MB when using base64-encoded format. Source: src/together/utils/files.py

API Key Configuration

Image generation requires a valid Together API key. Ensure the TOGETHER_API_KEY environment variable is set or passed directly to the client:

# Via environment variable
# export TOGETHER_API_KEY=your_api_key

client = Together()  # Reads from environment

# Or explicitly
client = Together(api_key="your_api_key")

Rate Limiting

Like other API endpoints, image generation is subject to rate limits. If you encounter RateLimitError, implement exponential backoff in your application:

import time
from together import Together
from together.error import RateLimitError

client = Together()
max_retries = 3

for attempt in range(max_retries):
    try:
        response = client.images.generate(
            prompt="your prompt",
            model="stabilityai/stable-diffusion-xl-base-1.0"
        )
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
        else:
            raise

Source: src/together/error.py:40-55

Error Handling

The SDK provides specific exception types for various error conditions:

Exception TypeDescription
TogetherExceptionBase exception for all SDK errors
RateLimitErrorAPI rate limit exceeded
APIConnectionErrorNetwork connectivity issues
TimeoutRequest timeout

Source: src/together/error.py

Example error handling:

from together import Together
from together.error import TogetherException, RateLimitError, Timeout

client = Together()

try:
    response = client.images.generate(
        prompt="test image",
        model="stabilityai/stable-diffusion-xl-base-1.0"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except Timeout:
    print("Request timed out. The image may be complex - try with fewer steps.")
except TogetherException as e:
    print(f"API error: {e}")

Best Practices

1. Optimize Image Dimensions

For faster generation, use smaller dimensions initially and upscale if needed:

# Faster initial generation
response = client.images.generate(
    prompt="landscape",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    height=512,
    width=512,
    steps=20,  # Fewer steps for draft
)

2. Use Seeds for Iteration

When refining a concept, use a fixed seed to maintain consistency:

base_seed = 42

# Generate variations while maintaining composition
for i in range(4):
    response = client.images.generate(
        prompt=f"landscape with {'spring' if i % 2 == 0 else 'autumn'} colors",
        model="stabilityai/stable-diffusion-xl-base-1.0",
        seed=base_seed,
    )

3. Batch Generation

Generate multiple images in a single request when possible for efficiency:

response = client.images.generate(
    prompt="concept variations",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    n=4,  # Single API call for 4 images
)

4. Handle Revised Prompts

The API may modify prompts for safety reasons. Always check for revised prompts:

response = client.images.generate(
    prompt="your prompt here",
    model="stabilityai/stable-diffusion-xl-base-1.0",
)

for image_data in response.data:
    if image_data.revised_prompt and image_data.revised_prompt != prompt:
        print(f"Prompt was revised to: {image_data.revised_prompt}")

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Files API

Related topics: Fine-Tuning

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Validation Rules

Continue reading this section for the full explanation and source context.

Section Supported Content Types

Continue reading this section for the full explanation and source context.

Related topics: Fine-Tuning

Files API

The Files API provides capabilities for uploading, managing, and validating training datasets for use with Together AI's fine-tuning services. It serves as the foundation for preparing training data that powers model customization workflows.

Overview

The Files API enables developers to:

  • Upload training datasets in JSONL format for fine-tuning jobs
  • Validate file content locally before uploading to catch formatting errors early
  • Manage remote files (list, retrieve, delete) on Together's infrastructure
  • Support multimodal content including text and image data for vision model training

Source: README.md

Architecture

graph TD
    A[User Code / CLI] --> B[Files API Client]
    B --> C[FileManager]
    C --> D[Together API]
    
    E[Local Validation] --> B
    E --> F[files.py utils]
    F --> G[JSONL Parser]
    G --> H[Content Validators]
    
    I[Fine-tuning] --> D
    I --> C
    
    style D fill:#e1f5fe
    style C fill:#fff3e0
    style F fill:#f3e5f5

Component Overview

ComponentFileResponsibility
Together clientfilemanager.pyMain API entry point
FileManagerfilemanager.pyHandles file operations
files.py utilsutils/files.pyLocal validation and parsing
CLI commandscli/api/files.pyCommand-line interface

Source: src/together/filemanager.py

File Validation

The SDK provides robust local validation capabilities through the files.py utility module. This validation runs before uploads to catch formatting errors early, preventing failed fine-tuning jobs due to malformed data.

Validation Rules

The validator checks multiple aspects of your JSONL files:

Validation RuleDescriptionError Type
content field typeMust be a list of dictsInvalidFileFormatError
type field presenceEach item must have a type fieldInvalidFileFormatError
Text contentFor type: "text", must have valid text stringInvalidFileFormatError
Image contentFor type: "image_url", must have valid image_url dictInvalidFileFormatError
Image sizeBase64 images must be under 10MBInvalidFileFormatError
Image limitMaximum 10 images per exampleInvalidFileFormatError
Image roleImages only allowed in user messagesInvalidFileFormatError

Source: src/together/utils/files.py

Supported Content Types

# Text content
{"type": "text", "text": "The training prompt here"}

# Image URL content
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}

# Base64 image content
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}

Multimodal Dataset Structure

The validator supports multimodal datasets for vision model fine-tuning:

graph LR
    A[JSONL Line] --> B{Parse content}
    B -->|List| C[Validate each item]
    B -->|String| D[Plain text]
    
    C --> E{type == "text"?}
    C --> F{type == "image_url"?}
    
    E -->|Yes| G[Validate text field]
    F -->|Yes| H[Validate image_url dict]
    F -->|No| I[Error: Unknown type]
    
    H --> J{URL or Base64?}
    J -->|Base64| K[Check size < 10MB]
    K --> L[Count images]
    J -->|URL| L

Source: src/together/utils/files.py

Python Client Usage

Initialization

from together import Together

client = Together()

The client automatically reads the TOGETHER_API_KEY environment variable. You can also pass the key explicitly:

client = Together(api_key="your-api-key-here")

File Operations

#### Upload a File

response = client.files.upload(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)
print(response.id)

#### List Files

files = client.files.list()

for file in files.data:
    print(f"ID: {file.id}, Filename: {file.filename}, Size: {file.bytes}")

#### Retrieve File Metadata

file_info = client.files.retrieve(file_id="file-xxxxx")
print(f"Created: {file_info.created_at}")
print(f"Filename: {file_info.filename}")

#### Retrieve File Content

content = client.files.retrieve_content(file_id="file-xxxxx")
print(content)

#### Delete a File

result = client.files.delete(file_id="file-xxxxx")
print(result.deleted)

Source: src/together/filemanager.py

CLI Usage

The together files command provides a command-line interface for file operations.

Command Overview

together files --help
CommandDescription
together files checkValidate a local file before uploading
together files uploadUpload a file to Together AI
together files listList all uploaded files
together files retrieveGet file metadata
together files retrieve-contentDownload file content
together files deleteDelete a remote file

Source: README.md

Check File (Local Validation)

Validate your JSONL file locally before uploading:

together files check example.jsonl

This runs the same validation logic that the SDK uses, checking:

  • JSONL format validity
  • Content structure
  • Multimodal content rules
  • Image size limits

Upload a File

together files upload example.jsonl

List Files

together files list

Retrieve File Metadata

together files retrieve file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Retrieve File Content

together files retrieve-content file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Delete a Remote File

together files delete file-6f50f9d1-5b95-416c-9040-0799b2b4b894

Data Flow for Fine-tuning

The Files API integrates directly with the Fine-tuning API. Here's how files flow through the system:

sequenceDiagram
    participant User
    participant CLI as Files CLI
    participant SDK as Python SDK
    participant API as Together API
    participant FT as Fine-tuning |
    
    User->>CLI: together files upload data.jsonl
    CLI->>SDK: client.files.upload()
    SDK->>SDK: Validate locally
    SDK->>API: POST /v1/files
    API-->>SDK: {id: "file-xxxxx"}
    SDK-->>CLI: File upload response
    
    User->>CLI: together fine-tuning create
    CLI->>SDK: client.fine_tuning.create(training_file="file-xxxxx")
    SDK->>API: POST /v1/fine_tuning/jobs
    API-->>SDK: {id: "ft-xxxxx"}
    SDK-->>CLI: Fine-tuning job response

Error Handling

The SDK defines several exception types for file-related errors:

ExceptionUse Case
TogetherExceptionBase exception class
FileTypeErrorInvalid file type or format
APIConnectionErrorNetwork connectivity issues
TimeoutRequest timeout

Source: src/together/error.py

Handling Upload Errors

from together import Together
from together.error import FileTypeError, APIConnectionError

client = Together()

try:
    response = client.files.upload(
        file=open("data.jsonl", "rb"),
        purpose="fine-tune"
    )
except FileTypeError as e:
    print(f"Invalid file format: {e}")
except APIConnectionError as e:
    print(f"Connection error: {e}")

Common Issues

File Format Validation Failures

The local validation (together files check) should be run before uploading. This catches the most common issues:

  1. Missing type field: Every content item must have a type field
  2. Invalid type value: Must be either "text" or "image_url"
  3. Missing text field: Text items must have a text string field
  4. Image in non-user message: Images are only allowed in user roles
  5. Base64 size exceeded: Images must be under 10MB when base64-encoded

Fine-tuning Integration

Files uploaded via the Files API can be used in fine-tuning jobs:

from together import Together

client = Together()

# Upload training file
training_file = client.files.upload(
    file=open("train.jsonl", "rb"),
    purpose="fine-tune"
)

# Create fine-tuning job with uploaded file
job = client.fine_tuning.create(
    training_file=training_file.id,
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct"
)

Source: src/together/resources/finetune.py

Configuration Options

File Upload Parameters

ParameterTypeRequiredDescription
filefile-like objectYesFile to upload
purposestringYesIntended use (e.g., "fine-tune")

File Check Parameters

ParameterTypeRequiredDescription
file_pathstringYesPath to local file

Best Practices

  1. Always validate locally first: Run together files check before uploading to catch format errors early
  2. Use descriptive filenames: Makes files easier to identify in the file list
  3. Check file size: Large files may take longer to upload and process
  4. Verify JSONL format: Ensure each line is valid JSON
  5. Test with small dataset first: Validate your pipeline with a subset before full upload

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Fine-Tuning

Related topics: Files API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Python Client

Continue reading this section for the full explanation and source context.

Section Supported Parameters

Continue reading this section for the full explanation and source context.

Related topics: Files API

Fine-Tuning

The Fine-Tuning module in the Together Python SDK provides a comprehensive interface for customizing foundation models on the Together Inference API. This module enables developers to adapt pre-trained models to their specific use cases through supervised fine-tuning, LoRA (Low-Rank Adaptation), and advanced alignment methods like DPO (Direct Preference Optimization).

Overview

Fine-tuning transforms a pre-trained model into a specialized tool tailored for specific tasks, domains, or behaviors. The Together platform supports multiple fine-tuning methodologies:

Training MethodDescriptionUse Case
Full TrainingUpdates all model weightsMaximum customization, larger datasets
LoRALow-Rank Adaptation with adapter weightsEfficient fine-tuning, lower compute costs
DPODirect Preference OptimizationAlignment and preference learning
RPORelative Preference OptimizationAlternative alignment approach
SimPOSimple Preference OptimizationSimplified alignment without reference model

Source: src/together/resources/finetune.py

Architecture

The fine-tuning system follows a layered architecture with the FineTuning class serving as the primary interface:

graph TD
    A[User Application] --> B[Together Client]
    B --> C[FineTuning Class]
    C --> D[APIRequestor]
    D --> E[Together Inference API]
    
    F[CLI Commands] --> C
    G[Legacy API] --> C
    
    H[File Validation] --> C
    I[Checkpoint Management] --> C
    J[Price Estimation] --> C

Core Components

ComponentLocationPurpose
FineTuningresources/finetune.pyMain API interface for fine-tuning operations
FineTuneCreateRequesttypes/finetune.pyRequest payload model for job creation
CLI Commandscli/api/finetune.pyCommand-line interface for fine-tuning
Legacy APIlegacy/finetune.pyBackward-compatible wrapper functions
File Validationutils/files.pyDataset file format validation

Creating Fine-Tuning Jobs

Python Client

The FineTuning.create() method initiates a new fine-tuning job. The method accepts numerous parameters to customize the training process:

from together import Together

client = Together()

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    validation_file="file-def456",
    n_epochs=3,
    batch_size=4,
    learning_rate=1e-5,
    suffix="my-custom-model",
    wandb_api_key="your-wandb-key",
    wandb_project_name="my-project",
)
print(response)

Source: src/together/resources/finetune.py

Supported Parameters

ParameterTypeDefaultDescription
modelstrRequiredBase model identifier (e.g., meta-llama/Llama-3-8b-hf)
training_filestrRequiredUploaded training file ID
validation_filestrOptionalUploaded validation file ID
n_epochsint3Number of training epochs
n_checkpointsint1Number of checkpoints to save
batch_sizeintAutoTraining batch size
learning_ratefloat1e-5Initial learning rate
lr_scheduler_typestrcosineLearning rate scheduler
warmup_ratiofloat0.1Warmup ratio for learning rate
weight_decayfloat0.01Weight decay coefficient
max_grad_normfloat1.0Maximum gradient norm
suffixstrNoneCustom suffix for output model name
loraboolFalseEnable LoRA fine-tuning
lora_rint8LoRA attention dimension
lora_dropoutfloat0.05LoRA dropout probability
lora_alphaint16LoRA alpha parameter
train_on_inputsboolNoneMask user messages in training
train_visionboolFalseTrain vision encoder (multimodal models)
training_methodstrsftTraining method (dpo, rpo, simpo)
from_checkpointstrNoneResume from previous job checkpoint
from_hf_modelstrNoneHuggingFace model to continue training from

Source: src/together/resources/finetune.py

Async Support

For asynchronous applications, use AsyncTogether with the async FineTuning methods:

import asyncio
from together import AsyncTogether

async_client = AsyncTogether()

async def create_ft_job():
    response = await async_client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
        n_epochs=3,
    )
    return response

result = asyncio.run(create_ft_job())

Source: src/together/resources/finetune.py

Managing Fine-Tuning Jobs

Job Lifecycle

stateDiagram-v2
    [*] --> Created: create()
    Created --> Queued: Submitted
    Queued --> Running: Started
    Running --> Completed: Success
    Running --> Failed: Error
    Completed --> Cancelled: cancel()
    Queued --> Cancelled: cancel()

Listing Jobs

Retrieve all fine-tuning jobs associated with your account:

response = client.fine_tuning.list()
for job in response.data:
    print(f"ID: {job.id}, Model: {job.model}, Status: {job.status}")

Retrieving Job Details

Get detailed information about a specific fine-tuning job:

job = client.fine_tuning.retrieve(id="ft-job-abc123")
print(f"Status: {job.status}")
print(f"Training steps: {job.training_steps}")
print(f"Output model: {job.output_name}")

Cancelling Jobs

Abort a running or queued fine-tuning job:

result = client.fine_tuning.cancel(id="ft-job-abc123")

Source: src/together/resources/finetune.py

Checkpoint Management

Checkpoints enable resuming training from intermediate states and retrieving model weights for deployment.

Retrieving Checkpoints

checkpoints = client.fine_tuning.checkpoints(id="ft-job-abc123")
for checkpoint in checkpoints.data:
    print(f"Step: {checkpoint.step}, Type: {checkpoint.checkpoint_type}")

The _parse_raw_checkpoints() helper processes raw checkpoint metadata:

parsed_checkpoints = []
for checkpoint in checkpoints:
    step = checkpoint["step"]
    checkpoint_type = checkpoint["checkpoint_type"]
    checkpoint_name = (
        f"{id}:{step}" if "intermediate" in checkpoint_type.lower() else id
    )
    parsed_checkpoints.append(
        FinetuneCheckpoint(
            type=checkpoint_type,
            timestamp=checkpoint["created_at"],
            name=checkpoint_name,
        )
    )

Source: src/together/resources/finetune.py

Download Checkpoints

Download fine-tuned model weights using the CLI:

# Download latest checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Download specific checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-step 1000

# Download with specific checkpoint type
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-type merged

#### Checkpoint Types

TypeDescriptionApplicable Training
defaultDefault output formatAll
mergedMerged with base model (LoRA only)LoRA
adapterAdapter weights only (LoRA only)LoRA
model_output_pathFull model output (Full only)Full

Source: src/together/cli/api/finetune.py

Download Options

CLI OptionDescription
--output_dir, -oOutput directory for downloaded files
--checkpoint-step, -sSpecific checkpoint step to download
--checkpoint-typeCheckpoint type (default, merged, adapter)
result = client.fine_tuning.download(
    fine_tune_id="ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b",
    output="./model-output",
    checkpoint_step=1000,
    checkpoint_type=DownloadCheckpointType.MERGED,
)
print(f"Downloaded to: {result.filename}")

CLI Commands

The Together CLI provides a comprehensive set of commands for fine-tuning operations:

Create a Fine-Tuning Job

together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --n-epochs 3 \
    --suffix my-custom-model

List Fine-Tuning Jobs

together fine-tuning list

Retrieve Job Details

together fine-tuning retrieve ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

Cancel a Job

# With confirmation prompt
together fine-tuning cancel ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

# Force deletion without confirmation
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --force

Delete a Job

together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b

Source: src/together/cli/api/finetune.py

Weights & Biases Integration

The SDK supports automatic logging to Weights & Biases for experiment tracking:

together fine-tuning create \
    --model meta-llama/Llama-3-8b-hf \
    --training-file file-abc123 \
    --wandb-api-key your-api-key \
    --wandb-project-name my-project \
    --wandb-name my-experiment-run
ParameterDescription
--wandb-api-keyWeights & Biases API key
--wandb-project-nameW&B project name
--wandb-nameW&B run name
--wandb-base-urlW&B base URL (for enterprise deployments)

Source: src/together/cli/api/finetune.py

File Format Requirements

Training and validation files must follow specific JSONL (JSON Lines) format requirements:

Instruction Tuning Format

{"text": "What is the capital of France?\nAnswer: Paris"}

Chat/Conversation Format

{"content": [{"type": "text", "text": "What is the capital of France?"}], "role": "user"}
{"content": [{"type": "text", "text": "Paris"}], "role": "assistant"}

Multimodal Format (with Images)

{"content": [{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}], "role": "user"}

Validation Rules

The file validation system enforces the following rules:

RuleErrorSource
File must be valid JSONLInvalidFileFormatErrorutils/files.py
Content must be a list of dictsInvalidFileFormatErrorutils/files.py
Each item must have type fieldInvalidFileFormatErrorutils/files.py
Text items must have text field (string)InvalidFileFormatErrorutils/files.py
Image items must be in user messages onlyInvalidFileFormatErrorutils/files.py
Image items must have image_url dictInvalidFileFormatErrorutils/files.py

Source: src/together/utils/files.py

Error Handling

The fine-tuning module defines specific exception types for different failure scenarios:

Exception Types

ExceptionUse Case
TogetherExceptionBase exception class
RateLimitErrorAPI rate limit exceeded
FileTypeErrorInvalid file format
APIConnectionErrorNetwork connectivity issues
TimeoutRequest timeout

Source: src/together/error.py

Error Response Model

from together.types.error import TogetherErrorResponse

error_response = TogetherErrorResponse(
    message="Invalid training file format",
    type="validation_error",
    param="training_file",
    code="INVALID_FORMAT"
)

Handling Errors

from together import Together
from together.error import RateLimitError, TogetherException

client = Together()

try:
    response = client.fine_tuning.create(
        model="meta-llama/Llama-3-8b-hf",
        training_file="file-abc123",
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
    print(f"Fine-tuning error: {e}")

Legacy API

The SDK provides backward-compatible wrapper functions in the legacy module:

from together.legacy import finetune

# These functions are deprecated but still functional
response = finetune.create(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
)
⚠️ Warning: The legacy functions emit deprecation warnings. Migrate to the new client.fine_tuning interface for new projects.

Source: src/together/legacy/finetune.py

Common Patterns

Resuming from Checkpoint

Continue training from a previous fine-tuning job:

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_checkpoint="ft-previous-job:1000",  # Resume from step 1000
)

Fine-tuning from HuggingFace Model

Start training from a HuggingFace Hub model:

response = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-hf",
    training_file="file-abc123",
    from_hf_model="username/my-finetuned-model",
    hf_model_revision="v1.0",
)

Training with Price Limits

The SDK includes price estimation to prevent unexpected costs:

price_estimation = client.fine_tuning.estimate_price(
    training_file="file-abc123",
    model="meta-llama/Llama-3-8b-hf",
    n_epochs=3,
    training_type="lora",
)

if price_estimation.allowed_to_proceed:
    response = client.fine_tuning.create(...)
else:
    print(f"Estimated cost ${price_estimation.estimated_cost} exceeds limit")

Source: src/together/resources/finetune.py

Price Estimation

The price estimation feature helps users understand the expected cost before starting a fine-tuning job:

graph LR
    A[User Creates Job] --> B{from_checkpoint or from_hf_model?}
    B -->|No| C[Estimate Price]
    B -->|Yes| D[Skip Estimation]
    C --> E{Cost within limits?}
    E -->|Yes| F[Submit Job]
    E -->|No| G[Show Warning]
    D --> F

Price estimation is automatically performed when creating jobs without a checkpoint or HuggingFace model source, unless explicitly disabled.

See Also

Source: https://github.com/togethercomputer/together-python / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium v.1.5.31

The project should not be treated as fully validated until this signal is reviewed.

medium v.1.5.33

The project should not be treated as fully validated until this signal is reviewed.

medium v1.5.28

The project should not be treated as fully validated until this signal is reviewed.

Doramagic Pitfall Log

Doramagic extracted 12 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:624113979 | https://github.com/togethercomputer/together-python | README/documentation is current enough for a first validation pass.

2. Project risk: v.1.5.31

  • Severity: medium
  • Finding: Project risk is backed by a source signal: v.1.5.31. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.31

3. Project risk: v.1.5.33

  • Severity: medium
  • Finding: Project risk is backed by a source signal: v.1.5.33. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.33

4. Project risk: v1.5.28

  • Severity: medium
  • Finding: Project risk is backed by a source signal: v1.5.28. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.28

5. Maintenance risk: v.1.5.29

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: v.1.5.29. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.29

6. Maintenance risk: v1.5.27

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: v1.5.27. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.27

7. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | last_activity_observed missing

8. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

9. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium

10. Security or permission risk: `LogProbs.top_logprobs` typed as `Dict` but API returns `List[Dict]`

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: LogProbs.top_logprobs typed as Dict but API returns List[Dict]. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/issues/443

11. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | issue_or_pr_quality=unknown

12. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 8

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using together-python with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence