Doramagic Project Pack · Human Manual
together-python
Related topics: Installation and Setup, Client Architecture
Overview
Related topics: Installation and Setup, Client Architecture
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation and Setup, Client Architecture
Overview
The together-python repository is an official Python SDK and Command Line Interface (CLI) for interacting with the Together AI API. It provides developers with programmatic access to a wide range of large language models (LLMs), image generation models, embedding services, and fine-tuning capabilities hosted on the Together platform.
Source: README.md
Source: https://github.com/togethercomputer/together-python / Human Manual
Installation and Setup
Related topics: Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview
Installation and Setup
This page covers the complete installation and setup process for the Together Python SDK (together-python), including prerequisites, configuration options, CLI setup, and development environment configuration.
Overview
The together-python SDK provides a Python interface and command-line tool for interacting with the Together AI API. It enables developers to:
- Access chat completions with support for multimodal inputs (text and images)
- Generate text completions
- Create and manage fine-tuning jobs
- Generate images
- Compute embeddings and reranking
- Manage files and model resources
Source: README.md
Prerequisites
Python Version Requirements
The SDK requires Python 3.10 or higher. The project uses modern Python features including type hints and async/await patterns.
API Key
A valid Together AI API key is required for all API operations. You can obtain an API key by:
- Creating an account at api.together.ai
- Navigating to the API keys settings page
Source: README.md
Installation Methods
Standard Installation (pip)
Install the latest stable release from PyPI:
pip install together
Poetry Installation
For projects using Poetry as the dependency manager:
poetry add together
Source: CONTRIBUTING.md
Development Installation
For contributors who want to modify the source code or run tests locally:
# Clone the repository
git clone https://github.com/togethercomputer/together-python.git
cd together-python
# Install with development dependencies
poetry install --with quality,tests
Source: CONTRIBUTING.md
Configuration
Environment Variables
The SDK supports configuration through environment variables. The primary variable required is:
| Environment Variable | Description | Required |
|---|---|---|
TOGETHER_API_KEY | Your Together AI API key | Yes |
#### Setting the API Key
Unix/Linux/macOS:
export TOGETHER_API_KEY=xxxxx
Windows (Command Prompt):
set TOGETHER_API_KEY=xxxxx
Windows (PowerShell):
$env:TOGETHER_API_KEY="xxxxx"
Source: README.md
Client Configuration
The Python client can be initialized with or without an explicit API key:
Using environment variable (recommended):
from together import Together
client = Together() # Automatically reads TOGETHER_API_KEY
Explicit API key:
from together import Together
client = Together(api_key="your-api-key-here")
Source: README.md
Optional Dependencies
The SDK uses Poetry for dependency management. Some features require optional dependencies:
| Extra | Purpose | Install Command |
|---|---|---|
extended_testing | Additional testing dependencies | poetry install --with extended_testing |
When adding new dependencies, maintainers follow a strict policy: dependencies should be optional and users who don't have them installed should be able to import the SDK without warnings or errors.
Source: CONTRIBUTING.md
CLI Setup
Installation
The CLI is included with the main package installation. After installing together, the together command becomes available.
Verification
Verify the CLI installation:
together --help
Common CLI Commands
| Command | Description |
|---|---|
together chat | Chat completions |
together completions | Text completions |
together images | Image generation |
together files | File management |
together fine-tuning | Fine-tuning operations |
together models | List and manage models |
Source: README.md
Client Initialization Patterns
Synchronous Client
from together import Together
client = Together()
Asynchronous Client
from together import AsyncTogether
async_client = AsyncTogether()
Basic Usage Flow
graph TD
A[Install together package] --> B[Set TOGETHER_API_KEY]
B --> C[Import Together or AsyncTogether]
C --> D[Initialize client]
D --> E[Call API methods]
E --> F[Process response]SDK Constants
The SDK defines several constants in src/together/constants.py:
| Constant | Purpose |
|---|---|
| API base URLs | Endpoint configurations |
| Default timeouts | Request timeout values |
| Version information | SDK version tracking |
Source: src/together/constants.py
Error Handling Setup
The SDK provides a comprehensive error hierarchy for handling API-related issues:
Exception Types
| Exception Class | Purpose |
|---|---|
TogetherException | Base exception class |
RateLimitError | Handle rate limiting |
FileTypeError | File format validation errors |
APIConnectionError | Network connectivity issues |
Timeout | Request timeout handling |
AuthenticationError | Invalid API key errors |
Source: src/together/error.py
Error Response Format
API errors are returned with structured information:
class TogetherErrorResponse(BaseModel):
message: str | None = None # Error message
type: str | None = None # Error type
param: str | None = None # Parameter causing error
code: str | None = None # Error code
Source: src/together/types/error.py
Error Handling Example
from together import Together
from together.error import TogetherException, RateLimitError
client = Together()
try:
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
print(f"API error: {e}")
Known Compatibility Issues
Typer Version Conflict
Note: The SDK has a dependency constraint ontyper<0.16.0. If your project requirestyper>=0.16.0, you may encounter dependency conflicts. See Issue #348 for tracking.
This is a known community issue where projects depending on newer typer versions cannot use together-python without resolving the conflict.
Pillow Version
Note: The SDK's image processing may have transitive dependency issues withpillow>=11.0.0when used alongside libraries like autogen0.4.2. See Issue #237 for details.
Development Environment Setup
1. Install Poetry
Follow the official Poetry installation guide.
Important: If you use Conda or Pyenv, create and activate a new environment before installing Poetry:
```bash
conda create -n together python=3.10
conda activate together
```
2. Configure Poetry
Tell Poetry to use the active Python environment:
poetry config virtualenvs.prefer-active-python true
3. Install Dependencies
poetry install --with quality,tests
4. Set Up Pre-commit Hooks
The project uses pre-commit for auto-formatting and linting:
pre-commit install
Source: CONTRIBUTING.md
Running Tests
#### Unit Tests
make tests
#### Integration Tests
Warning: Integration tests require an active API key and will incur charges.
make integration_tests
Source: CONTRIBUTING.md
Formatting and Linting
Before submitting changes, run formatting locally:
make format
The CI system automatically checks formatting, linting, and tests.
Source: CONTRIBUTING.md
Quick Start Checklist
| Step | Task | Command/Action |
|---|---|---|
| 1 | Check Python version | python --version (requires 3.10+) |
| 2 | Install SDK | pip install together |
| 3 | Set API key | export TOGETHER_API_KEY=xxxxx |
| 4 | Verify installation | python -c "from together import Together; print('OK')" |
| 5 | Test basic call | Run a simple chat completion |
See Also
- Chat Completions - Using the chat API
- Fine-tuning - Training custom models
- Image Generation - Creating images
- Error Handling - Handling API errors
- Contributing Guide - Development contribution guidelines
Source: https://github.com/togethercomputer/together-python / Human Manual
Client Architecture
Related topics: Chat Completions, Type System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chat Completions, Type System
Client Architecture
Overview
The Together Python SDK provides a unified interface for interacting with the Together AI platform through both a programmatic Python client and a command-line interface (CLI). The client architecture follows a layered design pattern that separates concerns between API communication, resource management, and user-facing interfaces.
The architecture is designed to support multiple API capabilities including chat completions, text completions, embeddings, image generation, file management, and fine-tuning operations. Source: src/together/resources/__init__.py
Core Components
The SDK architecture consists of three primary layers that work together to provide a seamless developer experience:
graph TD
A[User Application] --> B[Together Client]
B --> C[Resource Modules]
C --> D[API Requestor]
D --> E[Together AI API]
E --> D
D --> B
B --> A
F[CLI Commands] --> B
subgraph Resources
G[Chat Completions]
H[Completions]
I[Embeddings]
J[Images]
K[Files]
L[Fine-tuning]
end
C --> G
C --> H
C --> I
C --> J
C --> K
C --> LTogether Client Class
The Together class serves as the main entry point for the SDK. It provides a synchronous interface for all API operations and manages the underlying HTTP client configuration.
Key Responsibilities:
- Initialization and configuration of API credentials
- Delegation of requests to appropriate resource modules
- Streaming response handling
- Timeout and connection management
Source: src/together/client.py
Basic Initialization:
from together import Together
# Using environment variable (TOGETHER_API_KEY)
client = Together()
# Explicit API key
client = Together(api_key="your-api-key-here")
API Requestor
The APIRequestor class handles the low-level communication with the Together AI API. It abstracts away HTTP details and provides a consistent interface for both synchronous and asynchronous operations.
Requestor Responsibilities:
- Constructing HTTP requests with proper authentication headers
- Handling request serialization and response parsing
- Managing streaming responses
- Implementing retry logic for transient failures
- Processing error responses into typed exceptions
Source: src/together/abstract/api_requestor.py
Resource Modules
Resource modules encapsulate API operations by domain. Each resource module provides type-safe methods for a specific category of API endpoints.
| Resource Module | Purpose | Key Methods |
|---|---|---|
chat.completions | Chat-based language model interactions | create(), streaming variants |
completions | Text completion operations | create(), streaming variants |
embeddings | Text embedding generation | create() |
images | Image generation | generate() |
files | File upload, retrieval, and management | upload(), retrieve(), list(), delete() |
fine_tuning | Model fine-tuning operations | create(), retrieve(), list(), cancel(), download() |
Source: src/together/resources/__init__.py
Request/Response Flow
Synchronous Request Flow
sequenceDiagram
participant App as Application Code
participant Client as Together Client
participant Resource as Resource Module
participant Requestor as API Requestor
participant API as Together AI API
App->>Client: client.chat.completions.create(...)
Client->>Resource: delegating request
Resource->>Resource: build request parameters
Resource->>Requestor: request()
Requestor->>API: POST /chat/completions
API-->>Requestor: JSON Response
Requestor->>Resource: parse response
Resource-->>Client: typed response object
Client-->>App: ChatCompletionResponseStreaming Response Handling
The SDK supports server-sent events (SSE) streaming for real-time token delivery. Streaming is handled differently depending on the API endpoint:
Chat Completions Streaming:
from together import Together
client = Together()
stream = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Source: src/together/resources/chat/completions.py
The streaming implementation yields ChatCompletionChunk objects asynchronously when iterating over the response stream.
Asynchronous Support
The SDK provides AsyncTogether for applications requiring concurrent API operations:
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
async def concurrent_requests():
tasks = [
async_client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": f"Prompt {i}"}]
)
for i in range(5)
]
responses = await asyncio.gather(*tasks)
return responses
Error Handling
The SDK defines a hierarchy of exception types for different error conditions, enabling precise error handling in application code.
Exception Hierarchy
TogetherException (base)
├── RateLimitError
├── FileTypeError
├── AttributeError
├── Timeout
├── APIConnectionError
Source: src/together/error.py
Error Response Model
API error responses are parsed into structured TogetherErrorResponse objects:
| Field | Type | Description | |
|---|---|---|---|
message | `str \ | None` | Human-readable error message |
type | `str \ | None` | Error category/type |
param | `str \ | None` | Parameter that caused the error |
code | `str \ | None` | Machine-readable error code |
Source: src/together/types/error.py
Error Handling Example
from together import Together
from together.error import RateLimitError, TogetherException
client = Together()
try:
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError as e:
print(f"Rate limited: {e.message}")
except TogetherException as e:
print(f"API error: {e.message}")
File Validation Architecture
The SDK includes robust file validation for fine-tuning datasets, ensuring data integrity before upload.
graph LR
A[Input File] --> B{File Type Check}
B -->|JSONL| C[JSONL Validator]
B -->|JSON| D[JSON Validator]
C --> E{Content Validation}
D --> E
E --> F[Schema Validation]
F --> G[Size Limits Check]
G --> H[Upload Ready]
E -->|Invalid| I[InvalidFileFormatError]Validation Rules
The file validation system enforces the following constraints:
| Rule | Limit | Description |
|---|---|---|
| Maximum base64 image size | 10MB | Per image in multimodal datasets |
| Maximum images per example | 5 | Images allowed in a single training example |
| Required fields | type, content | For each message in multimodal format |
Source: src/together/utils/files.py
Supported Content Types
| Type | Description | Role Restrictions |
|---|---|---|
text | Plain text content | Any role |
image_url | Base64-encoded image | User role only |
Fine-tuning Architecture
The fine-tuning module provides comprehensive support for training custom models on the Together platform.
Training Methods
The SDK supports multiple fine-tuning methodologies:
| Method | Description | Checkpoint Types |
|---|---|---|
| Full training | Updates all model weights | Default only |
| LoRA | Low-rank adaptation | Default, Merged, Adapter |
| DPO | Direct Preference Optimization | Default |
| SimPO | Simple Preference Optimization | Default |
| RPO | Reward Preference Optimization | Default |
Source: src/together/resources/finetune.py
Checkpoint Management
The fine-tuning resource handles checkpoint retrieval and download:
# List available checkpoints
checkpoints = client.fine_tuning.retrieve_checkpoints(fine_tune_id)
# Download specific checkpoint
result = client.fine_tuning.download(
fine_tune_id,
output="./checkpoints",
checkpoint_step=1000,
checkpoint_type=DownloadCheckpointType.MERGED
)
CLI Architecture
The command-line interface is built using Click and mirrors the Python client functionality.
CLI Command Structure
together
├── chat completions
├── completions
├── embeddings
├── files
│ ├── check
│ ├── upload
│ ├── list
│ ├── retrieve
│ └── delete
├── fine-tuning
│ ├── create
│ ├── list
│ ├── retrieve
│ ├── cancel
│ ├── download
│ └── delete
└── models
├── list
└── start
Source: src/together/cli/api/chat.py and src/together/cli/api/finetune.py
CLI Configuration
The CLI supports environment variable configuration:
# Set API key
export TOGETHER_API_KEY=your-api-key
# Use CLI
together chat completions --model meta-llama/Llama-4-Scout-17B-16E-Instruct \
--message "Hello, world!"
Known Issues and Limitations
Dependency Compatibility
Issue #348: The SDK has a dependency constraint on typer<0.16.0, which may conflict with projects requiring newer versions of typer. This can cause dependency resolution failures in environments where multiple packages have conflicting typer requirements.
Issue #237: The pillow dependency version may conflict with transitive dependencies from other packages like autogen>=0.4.2 that require pillow>=11.0.0.
Model Type Validation
Issue #337: The ModelObject type definition may not include all valid model types, potentially causing Pydantic validation errors when working with newer or specialized model types like transcription models.
Tool Response Handling
Issue #113: Multi-turn function calling workflows may encounter validation errors when processing tool response messages with role='tool'. Applications implementing function calling should ensure proper message formatting according to the Together AI API specification.
Best Practices
Connection Management
- Reuse the
Togetherclient instance across multiple requests to benefit from connection pooling - Set appropriate timeout values for long-running operations like fine-tuning
Error Recovery
- Implement exponential backoff for
RateLimitErrorhandling - Validate file contents locally before upload to avoid wasted API calls
Streaming Performance
- Process streaming chunks incrementally rather than buffering entire responses
- Use async variants (
AsyncTogether) for applications making multiple concurrent requests
See Also
Source: https://github.com/togethercomputer/together-python / Human Manual
Type System
Related topics: Client Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Client Architecture
Type System
The together-python SDK employs a comprehensive type system built on Pydantic for data validation, serialization, and API interaction. This document provides a detailed reference for developers working with the SDK's type definitions, error handling, and validation patterns.
Overview
The type system serves three primary purposes within the together-python SDK:
- Data Validation: Ensures API request parameters meet expected formats before transmission
- Serialization: Converts Python objects to JSON for API communication and deserializes responses
- IDE Support: Provides type hints for better developer experience and autocomplete
graph TD
A[User Code] --> B[Pydantic Models]
B --> C{Validation}
C -->|Pass| D[API Request]
C -->|Fail| E[Validation Error]
D --> F[API Response]
F --> G[Response Models]
G --> H[User Code]Base Types
Abstract Base Model
All SDK types inherit from BaseModel, which extends Pydantic's BaseModel with custom configuration:
# Source: src/together/types/abstract.py
class BaseModel(BaseModel):
"""Base model for all Together API types."""
model_config = ConfigDict(
populate_by_name=True,
validate_default=True,
arbitrary_types_allowed=True,
)
The BaseModel configures:
populate_by_name=True: Allows population by field name or aliasvalidate_default=True: Validates default values during initializationarbitrary_types_allowed=True: Permits custom type annotations
Error Response Model
The TogetherErrorResponse type defines the structure for API error responses:
| Field | Type | Description | |
|---|---|---|---|
message | `str \ | None` | Human-readable error message |
type | `str \ | None` | Error category/type |
param | `str \ | None` | Parameter that caused the error |
code | `str \ | None` | Machine-readable error code |
# Source: src/together/types/error.py
class TogetherErrorResponse(BaseModel):
message: str | None = None
type_: str | None = Field(None, alias="type")
param: str | None = None
code: str | None = None
Exception Hierarchy
The SDK defines a hierarchical exception system for granular error handling:
graph TD
A[TogetherException<br/>Base Exception] --> B[RateLimitError]
A --> C[FileTypeError]
A --> D[AttributeError]
A --> E[Timeout]
A --> F[APIConnectionError]
A --> G[InvalidRequestError]
A --> H[AuthenticationError]
A --> I[APIResponseError]Exception Types
| Exception Class | Purpose | Common Cause |
|---|---|---|
TogetherException | Base exception for all SDK errors | General failures |
RateLimitError | API rate limit exceeded | Too many requests |
FileTypeError | Invalid file type submitted | Unsupported file format |
AttributeError | Invalid attribute access | Missing or invalid parameter |
Timeout | Request timeout | Slow network or API |
APIConnectionError | Network connectivity issue | Connection failure |
InvalidRequestError | Malformed request | Invalid parameters |
AuthenticationError | Authentication failure | Invalid API key |
APIResponseError | Unexpected API response | Server-side error |
# Source: src/together/error.py
class RateLimitError(TogetherException):
def __init__(
self,
message: (
TogetherErrorResponse | Exception | str | RequestException | None
) = None,
**kwargs: Any,
) -> None:
super().__init__(message=message, **kwargs)
Exception Construction Pattern
All exception types accept flexible message parameters:
# Source: src/together/error.py
class Timeout(TogetherException):
def __init__(
self,
message: (
TogetherErrorResponse | Exception | str | RequestException | None
) = None,
**kwargs: Any,
) -> None:
super().__init__(message=message, **kwargs)
The message can be:
TogetherErrorResponse: Parsed API error responseException: Wrapped exceptionstr: Direct error messageRequestException: HTTP request exception
Request and Response Types
Chat Completions Types
The chat completions system uses structured types for requests and responses:
# Source: src/together/resources/chat/completions.py
response, _, _ = await requestor.arequest(
options=TogetherRequest(
method="POST",
url="chat/completions",
params=parameter_payload,
),
stream=stream,
)
if stream:
return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)
Streaming Response Types
Streaming responses yield ChatCompletionChunk objects:
| Field | Type | Description |
|---|---|---|
choices | List[Choice] | Generated completions |
model | str | Model identifier |
id | str | Request identifier |
usage | Usage | Token usage statistics |
File Validation Types
Content Item Types
The SDK validates file content for fine-tuning datasets:
# Source: src/together/utils/files.py
if item["type"] == "text":
if "text" not in item or not isinstance(item["text"], str):
raise InvalidFileFormatError(
"The dataset is malformed, the `text` field must be present in the `content` item field and be"
f" a string. Got '{item.get('text')!r}' instead.",
line_number=idx + 1,
error_source="key_value",
)
elif item["type"] == "image_url":
if role != "user":
raise InvalidFileFormatError(
"The dataset is malformed, only user messages can contain images.",
line_number=idx + 1,
error_source="key_value",
)
Content Type Enumeration
| Type | Valid Context | Description |
|---|---|---|
text | Any role | Plain text content |
image_url | User role only | Image URL reference |
Common Issues and Troubleshooting
Validation Errors
Pydantic validation errors occur when request data doesn't match expected types:
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelObject
type
Input should be 'chat', 'language', 'code', 'image', 'embedding',...
Resolution: Ensure model names are valid and match available models in the Together ecosystem. Use client.models.list() to verify available models.
Invalid File Format Errors
When uploading fine-tuning datasets, content validation enforces strict rules:
# Source: src/together/utils/files.py
if not isinstance(item, dict):
raise InvalidFileFormatError(
"The dataset is malformed, the `content` field must be a list of dicts.",
line_number=idx + 1,
error_source="key_value",
)
Type Mismatch in Streaming
When processing streaming responses, type assertions ensure correct handling:
# Source: src/together/cli/api/completions.py
if not no_stream:
for chunk in response:
assert isinstance(chunk, CompletionChunk)
assert chunk.choices
Type Annotations in CLI
The CLI uses Click decorators with type annotations for command-line argument validation:
# Source: src/together/cli/api/chat.py
@click.option(
"--max-tokens",
type=int,
help="Max tokens to generate"
)
@click.option(
"--temperature",
type=float,
help="Sampling temperature"
)
@click.option(
"--stop",
type=str,
multiple=True,
help="List of strings to stop generation"
)
CLI Type Conversion
| CLI Option Type | Python Type | Notes |
|---|---|---|
type=int | int | Integer values |
type=float | float | Decimal values |
type=str | str | String values |
multiple=True | tuple | Multiple values |
is_flag=True | bool | Boolean flags |
Async Type Handling
The SDK provides async variants of response types:
# Source: src/together/resources/chat/completions.py
if stream:
assert not isinstance(response, TogetherResponse)
return (ChatCompletionChunk(**line.data) async for line in response)
assert isinstance(response, TogetherResponse)
return ChatCompletionResponse(**response.data)
Best Practices
Type Safety Guidelines
- Use Response Models: Always use SDK response models instead of raw dictionaries
- Validate Early: Check input types before API calls
- Handle Exceptions: Catch specific exception types for targeted error handling
- Use Type Hints: Enable IDE autocomplete with proper imports
Importing Types
from together.types.error import TogetherErrorResponse
from together.error import (
TogetherException,
RateLimitError,
InvalidRequestError,
Timeout,
)
See Also
Source: https://github.com/togethercomputer/together-python / Human Manual
Chat Completions
Related topics: Completions API, Client Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Completions API, Client Architecture
Chat Completions
The Chat Completions API provides a unified interface for interacting with large language models on the Together platform through conversational message-based interactions. This feature supports text-only and multimodal inputs, streaming responses, function calling, and various generation parameters to control model behavior.
Overview
The Chat Completions resource is the primary interface for conversational AI interactions in the together-python SDK. It follows the OpenAI-compatible chat completions format, enabling developers to switch between providers with minimal code changes while leveraging Together's distributed inference infrastructure.
graph TD
A[Client Application] --> B[Together Client]
B --> C[Chat Completions.create]
C --> D[API Requestor]
D --> E[Together API]
E --> F[Model Inference]
F --> G[Response]
G --> D
D --> B
B --> H[ChatCompletionResponse]
style A fill:#e1f5fe
style H fill:#c8e6c9Key capabilities include:
- Text Completions: Standard conversational text generation with system, user, and assistant roles
- Multimodal Input: Support for images alongside text in user messages
- Streaming: Real-time token-by-token response streaming
- Function Calling: Tool-use with structured function definitions and responses
- Safety Controls: Built-in moderation model integration
- Audio Support: Attach audio URLs to messages for Whisper-transcribed context
Source: src/together/resources/chat/completions.py:1-50
Installation and Setup
Environment Configuration
The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.
export TOGETHER_API_KEY=your_api_key_here
Client Initialization
from together import Together
# Using environment variable
client = Together()
# Explicit API key
client = Together(api_key="your_api_key_here")
# Custom base URL for testing
client = Together(
api_key="your_api_key_here",
base_url="https://api.together.xyz"
)
Source: README.md
API Reference
Method Signature
ChatCompletions.create(
model: str,
messages: List[ChatCompletionMessageParam],
frequency_penalty: Optional[float] = None,
max_tokens: Optional[int] = None,
n: Optional[int] = None,
presence_penalty: Optional[float] = None,
stop: Optional[Union[str, List[str]]] = None,
stream: Optional[bool] = None,
temperature: Optional[float] = None,
top_p: Optional[float] = None,
top_k: Optional[int] = None,
min_p: Optional[float] = None,
repetition_penalty: Optional[float] = None,
logprobs: Optional[int] = None,
echo: Optional[bool] = None,
safety_model: Optional[str] = None,
response_format: Optional[ResponseFormat] = None,
tools: Optional[List[ChatCompletionToolParam]] = None,
tool_choice: Optional[Union[ChatCompletionToolChoiceEnum, ChatCompletionNamedToolChoiceParam]] = None,
audio: Optional[ChatCompletionAudioParam] = None,
max_completion_tokens: Optional[int] = None,
) -> ChatCompletionResponse
Source: src/together/resources/chat/completions.py:1-50
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | str | Yes | - | Model identifier (e.g., meta-llama/Llama-4-Scout-17B-16E-Instruct) |
messages | List[ChatCompletionMessageParam] | Yes | - | List of conversation messages with roles |
temperature | float | No | 0.7 | Sampling temperature (0.0-2.0) |
top_p | float | No | 1.0 | Nucleus sampling threshold |
top_k | int | No | - | Top-k token selection |
min_p | float | No | - | Minimum probability threshold |
max_tokens | int | No | 256 | Maximum tokens to generate |
max_completion_tokens | int | No | - | Alternative to max_tokens |
stream | bool | No | False | Enable streaming response |
stop | str or List[str] | No | - | Stop sequences |
n | int | No | 1 | Number of completions to generate |
presence_penalty | float | No | 0.0 | Penalize repeated tokens |
frequency_penalty | float | No | 0.0 | Penalize frequent tokens |
repetition_penalty | float | No | 1.0 | Token repetition penalty |
logprobs | int | No | - | Return log probabilities |
echo | bool | No | False | Echo prompt in response |
safety_model | str | No | - | Moderation model identifier |
response_format | ResponseFormat | No | - | Constrain output format (JSON schema) |
tools | List[ChatCompletionToolParam] | No | - | Available function definitions |
tool_choice | str or dict | No | "auto" | Tool selection strategy |
audio | ChatCompletionAudioParam | No | - | Audio parameters for voice input |
Message Format
Message Roles
The chat completions API supports structured conversation turns through a role-based message system:
| Role | Description | Content Type |
|---|---|---|
system | Instructions and context | Text only |
user | Human input | Text, images, or mixed |
assistant | Model responses | Text and tool calls |
tool | Function execution results | Text (JSON) |
developer | Developer instructions | Text only |
Message Structure
from together import Together
from together.types.chat.chat_completion_message_param import ChatCompletionMessageParam
client = Together()
messages: List[ChatCompletionMessageParam] = [
{
"role": "system",
"content": "You are a helpful coding assistant."
},
{
"role": "user",
"content": "Write a Python function to calculate factorial."
}
]
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=messages
)
print(response.choices[0].message.content)
Source: src/together/resources/chat/completions.py:1-50
Multimodal Messages
User messages can include both text and images using a content array:
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.png"
}
}
]
}]
)
Image URL content items must follow specific validation rules. The image_url field must be a dictionary containing a url key with a valid URL string. Images are only permitted in user role messages.
Source: src/together/utils/files.py:1-50
Streaming Responses
The API supports server-sent events (SSE) streaming for real-time token generation:
from together import Together
client = Together()
stream = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming Architecture
sequenceDiagram
participant Client
participant APIRequestor
participant TogetherAPI
participant Model
Client->>APIRequestor: create(stream=True)
APIRequestor->>TogetherAPI: POST /chat/completions
TogetherAPI->>Model: Start inference
Model-->>TogetherAPI: Token 1
TogetherAPI-->>APIRequestor: SSE: data: {...}
APIRequestor-->>Client: ChatCompletionChunk
Model-->>TogetherAPI: Token 2
TogetherAPI-->>APIRequestor: SSE: data: {...}
APIRequestor-->>Client: ChatCompletionChunk
Note over Model,Client: Streaming continues...
Model-->>TogetherAPI: [DONE]
TogetherAPI-->>APIRequestor: [DONE]
APIRequestor-->>Client: Iterator endsWhen streaming is enabled, the method returns an async generator that yields ChatCompletionChunk objects. Each chunk contains incremental deltas that should be accumulated to reconstruct the complete response.
Source: src/together/resources/chat/completions.py:40-80
Function Calling
Function calling enables models to invoke predefined tools with structured outputs. This follows the OpenAI function calling schema.
Defining Tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
]
Tool Execution Flow
graph TD
A[User Query] --> B[Create with tools]
B --> C{Model selects tool?}
C -->|Yes| D[Return tool_call]
C -->|No| E[Return text response]
D --> F[Execute function]
F --> G[tool role message]
G --> H[Continue with messages]
H --> B
E --> I[Final Response]
style D fill:#fff3e0
style G fill:#e8f5e9Multi-turn Conversation
After receiving a function call, append the assistant's tool call message and the tool response:
# Initial request with tools
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)
assistant_msg = response.choices[0].message
print(f"Tool called: {assistant_msg.tool_calls[0].function.name}")
print(f"Arguments: {assistant_msg.tool_calls[0].function.arguments}")
# Simulate tool execution
tool_result = {"temperature": 22, "conditions": "Sunny"}
# Continue conversation with tool response
messages = [
{"role": "user", "content": "What's the weather in Paris?"},
assistant_msg,
{
"role": "tool",
"tool_call_id": assistant_msg.tool_calls[0].id,
"content": json.dumps(tool_result)
}
]
final_response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=messages,
tools=tools
)
Note: There is a known issue (#113) where tool/function response messages withrole='tool'may encounter validation errors. Ensure thetool_call_idmatches exactly and the content is valid JSON.
Source: src/together/resources/chat/completions.py:20-60
CLI Interface
The Together CLI provides command-line access to chat completions:
# Basic chat completion
together chat.completions \
--model meta-llama/Llama-4-Scout-17B-16E-Instruct \
--message "Hello, how are you?"
# Streaming response
together chat.completions \
--model meta-llama/Llama-4-Scout-17B-16E-Instruct \
--message "Write a story"
# With temperature control
together chat.completions \
--model meta-llama/Llama-4-Scout-17B-16E-Instruct \
--message "Explain physics" \
--temperature 0.8
CLI Options
| Option | Type | Description |
|---|---|---|
--message | (str, str) multiple | Message as role-content tuple |
--model | str | Model identifier (required) |
--max-tokens | int | Maximum tokens to generate |
--temperature | float | Sampling temperature |
--top-p | int | Nucleus sampling |
--top-k | float | Top-k sampling |
--stop | str multiple | Stop sequences |
--repetition-penalty | float | Repetition penalty |
--presence-penalty | float | Presence penalty |
--frequency-penalty | float | Frequency penalty |
--min-p | float | Minimum p sampling |
--no-stream | flag | Disable streaming |
--safety-model | str | Moderation model |
--raw | flag | Return raw JSON |
Source: src/together/cli/api/chat.py:1-100
Error Handling
The SDK provides structured exception types for different error conditions:
from together import Together
from together.error import (
TogetherException,
RateLimitError,
APIConnectionError,
Timeout,
AuthenticationError
)
client = Together()
try:
response = client.chat.completions.create(
model="invalid-model-name",
messages=[{"role": "user", "content": "Hello"}]
)
except AuthenticationError as e:
print(f"Invalid API key: {e}")
except RateLimitError as e:
print(f"Rate limited: {e}")
except Timeout as e:
print(f"Request timed out: {e}")
except TogetherException as e:
print(f"API error: {e}")
Exception Hierarchy
classDiagram
class TogetherException {
+message
}
class RateLimitError {
+message
}
class APIConnectionError {
+message
}
class Timeout {
+message
}
class AuthenticationError {
+message
}
class FileTypeError {
+message
}
TogetherException <|-- RateLimitError
TogetherException <|-- APIConnectionError
TogetherException <|-- Timeout
TogetherException <|-- FileTypeErrorCommon Error Codes
| Error Type | Cause | Resolution |
|---|---|---|
400 Bad Request | Invalid parameters | Check message format, model name |
401 Unauthorized | Invalid API key | Verify TOGETHER_API_KEY |
429 Too Many Requests | Rate limit exceeded | Implement exponential backoff |
500 Internal Error | Server error | Retry with backoff |
504 Gateway Timeout | Request timeout | Increase timeout or retry |
Source: src/together/error.py:1-80
Response Format
Standard Response
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Hello"}]
)
# Access response attributes
print(response.id) # chatcmpl-xxx
print(response.model) # meta-llama/Llama-4-Scout-17B-16E-Instruct
print(response.choices[0].message.content) # Response text
print(response.usage.prompt_tokens) # Input tokens
print(response.usage.completion_tokens) # Output tokens
print(response.usage.total_tokens) # Total tokens
Streaming Chunk
for chunk in stream:
# ChatCompletionChunk structure
print(chunk.id) # Same ID as final response
print(chunk.choices[0].delta.content) # Incremental content
print(chunk.choices[0].finish_reason) # 'stop' or 'length'
Async Usage
The SDK provides async variants for concurrent operations:
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
async def multi_chat():
tasks = [
async_client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": f"Query {i}"}]
)
for i in range(5)
]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.choices[0].message.content)
asyncio.run(multi_chat())
Retry Logic and Timeouts
The API requestor implements automatic retry with exponential backoff:
from together.constants import (
MAX_RETRIES,
INITIAL_RETRY_DELAY,
MAX_RETRY_DELAY,
TIMEOUT_SECS
)
# Default configuration
# MAX_RETRIES: 10
# INITIAL_RETRY_DELAY: 0.5 seconds
# MAX_RETRY_DELAY: 120 seconds
# TIMEOUT_SECS: 600 seconds
# Custom configuration
client = Together(
max_retries=5,
timeout=300
)
The retry strategy handles:
- Connection timeouts
- 5xx server errors
- Rate limit responses (429)
Source: src/together/abstract/api_requestor.py:1-100
Known Limitations
| Issue | Description | Workaround |
|---|---|---|
| typer version conflict | SDK requires typer<0.16.0 | Use virtual environments |
| Model type validation | Some model types not recognized | Use model names directly |
| Tool response format | role='tool' messages may fail validation | Ensure proper tool_call_id and JSON content |
For the most current issues and workarounds, refer to the GitHub Issues.
Best Practices
- Token Management: Always set
max_tokensto prevent runaway generation - Error Handling: Wrap API calls in try-except blocks with appropriate exception handling
- Streaming: Use streaming for better perceived latency on long responses
- Context Management: Keep message lists manageable; trim old messages when对话 exceeds model context
- Safety: Enable
safety_modelfor user-facing applications
See Also
- Fine-tuning Documentation - Training custom models
- Embeddings - Vector representations
- Image Generation - Multimodal generation
- CLI Reference - Full CLI documentation
- Error Handling - Exception types and recovery
Source: https://github.com/togethercomputer/together-python / Human Manual
Completions API
Related topics: Chat Completions, Embeddings and Reranking
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chat Completions, Embeddings and Reranking
Completions API
The Completions API provides access to language model text completion endpoints in the Together AI platform. This API enables developers to generate text completions from various open-source models hosted on Together AI, supporting use cases ranging from code generation to creative writing.
Overview
The Together Python SDK provides two primary APIs for text generation:
- Completions API - Designed for legacy text completion models and prompt-based generation
- Chat Completions API - Optimized for modern chat-based models with structured message formats
Both APIs support synchronous, asynchronous, and streaming modes of operation.
Source: README.md:1-50
Installation and Setup
Environment Configuration
The SDK requires a Together API key for authentication. You can obtain one from the Together Playground settings page.
export TOGETHER_API_KEY=xxxxx
Client Initialization
from together import Together
# Using environment variable
client = Together()
# Explicit API key
client = Together(api_key="xxxxx")
Source: README.md:10-20
Usage Patterns
Synchronous Completion
The synchronous method blocks until the complete response is received:
from together import Together
client = Together()
response = client.completions.create(
model="codellama/CodeLlama-34b-Python-hf",
prompt="Write a Next.js component with TailwindCSS for a header component.",
max_tokens=200,
)
print(response.choices[0].text)
Source: README.md:80-90
Streaming Completion
Streaming allows real-time response generation by processing chunks as they arrive:
from together import Together
client = Together()
stream = client.completions.create(
model="codellama/CodeLlama-34b-Python-hf",
prompt="Write a Next.js component with TailwindCSS for a header component.",
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Source: README.md:92-103
Asynchronous Completion
The async API enables concurrent requests for improved throughput:
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
prompts = [
"Write a Next.js component with TailwindCSS for a header component.",
"Write a python function for the fibonacci sequence",
]
async def async_completion(prompts):
tasks = [
async_client.completions.create(
model="codellama/CodeLlama-34b-Python-hf",
prompt=prompt,
)
for prompt in prompts
]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.choices[0].text)
asyncio.run(async_completion(prompts))
Source: README.md:105-125
API Parameters
Core Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier from the available Together AI models |
prompt | string | Yes | The input prompt for text generation |
max_tokens | integer | No | Maximum number of tokens to generate |
temperature | float | No | Sampling temperature (0.0-2.0, default varies by model) |
top_p | float | No | Nucleus sampling probability threshold |
top_k | integer | No | Top-k sampling parameter |
stream | boolean | No | Enable streaming response (default: false) |
n | integer | No | Number of completions to generate |
stop | string/array | No | Stop sequence(s) to end generation |
logprobs | integer | No | Number of top log probabilities to return |
echo | boolean | No | Echo the prompt in the response |
repetition_penalty | float | No | Penalty for token repetition |
presence_penalty | float | No | Penalize tokens based on presence |
frequency_penalty | float | No | Penalize tokens based on frequency |
min_p | float | No | Minimum probability threshold for sampling |
safety_model | string | No | Moderation model to use |
Source: src/together/cli/api/completions.py:1-50
Parameter Details
#### Sampling Parameters
- temperature: Controls randomness in generation. Lower values (0.1-0.3) produce more deterministic output, while higher values (0.7-1.0) increase creativity.
- top_p: Also known as nucleus sampling, controls the cumulative probability mass to consider.
- top_k: Limits token selection to the top k most probable tokens.
#### Repetition Control
- repetition_penalty: Values > 1.0 discourage repetition, values < 1.0 encourage it.
- presence_penalty: Positive values encourage discussing new topics.
- frequency_penalty: Positive values reduce repetition of high-frequency tokens.
CLI Usage
The SDK includes a command-line interface for completions:
together completions "Your prompt here" --model codellama/CodeLlama-34b-Python-hf
CLI Options
| Option | Short | Description | Default |
|---|---|---|---|
--model | -m | Model name | Required |
--max-tokens | -t | Max tokens to generate | None |
--temperature | -T | Sampling temperature | None |
--top-p | -p | Top p sampling | None |
--top-k | -k | Top k sampling | None |
--stop | -s | Stop sequences (multiple allowed) | None |
--no-stream | -ns | Disable streaming | False |
--repetition-penalty | -rp | Repetition penalty | None |
--presence-penalty | -pp | Presence penalty | None |
--frequency-penalty | -fp | Frequency penalty | None |
--min-p | -mp | Minimum p | None |
--logprobs | -l | Return log probabilities | None |
--echo | -e | Echo prompt in response | False |
--n | -n | Number of generations | None |
--safety-model | -sm | Moderation model | None |
--raw | -r | Return raw JSON response | False |
Source: src/together/cli/api/completions.py:1-75
CLI Streaming Output
When streaming is enabled (default), the CLI processes chunks in real-time:
if not no_stream:
for chunk in response:
assert isinstance(chunk, CompletionChunk)
assert chunk.choices
if raw:
click.echo(f"{json.dumps(chunk.model_dump(exclude_none=True))}")
continue
for stream_choice in sorted(chunk.choices, key=lambda c: c.index):
assert isinstance(stream_choice, CompletionChoicesChunk)
assert stream_choice.delta
click.echo(f"{stream_choice.delta.content}", nl=False)
Source: src/together/cli/api/completions.py:45-65
Response Structure
Completion Response
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
choices | array | Array of completion choices |
choices[].text | string | Generated text content |
choices[].index | integer | Choice index for multiple completions |
choices[].finish_reason | string | Reason for completion ending |
model | string | Model used for generation |
usage | object | Token usage statistics |
Streaming Chunk Response
| Field | Type | Description |
|---|---|---|
id | string | Chunk identifier |
choices | array | Array of delta choices |
choices[].delta | object | Incremental text delta |
choices[].delta.content | string | Delta text content |
choices[].index | integer | Choice index |
Architecture
Request Flow
graph TD
A[Client.completions.create] --> B[Validate Parameters]
B --> C[APIRequestor]
C --> D{HTTP Method}
D -->|POST| E[Send Request to together.ai]
D -->|Streaming| F[Return Chunk Iterator]
E --> G[Parse Response]
G --> H[Return CompletionResponse]
F --> I[Stream Chunks]
I --> J[Yield CompletionChunk]Response Handling
graph TD
A[API Response] --> B{Streaming Mode?}
B -->|Yes| C[Return Async Generator]
B -->|No| D[Return TogetherResponse]
C --> E[ChatCompletionChunk]
D --> F[CompletionResponse]Error Handling
Exception Types
The SDK defines specific exception types for different error conditions:
| Exception | Description |
|---|---|
TogetherException | Base exception class |
RateLimitError | API rate limit exceeded |
APIConnectionError | Network connectivity issues |
Timeout | Request timeout |
FileTypeError | Invalid file type |
AttributeError | Invalid attribute access |
Source: src/together/error.py:1-60
Error Response Structure
class TogetherErrorResponse(BaseModel):
message: str
type: str
code: Optional[str] = None
param: Optional[str] = None
Common Error Scenarios
- Rate Limiting: When API rate limits are exceeded, the SDK automatically retries with exponential backoff based on configuration.
- Timeout: Configurable timeout with default handling:
# Default timeout is 60 seconds
TIMEOUT_SECS = 60
Source: src/together/abstract/api_requestor.py:20-40
- Invalid Model: Returns validation error with available model list
Configuration Options
Client Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | string | env: TOGETHER_API_KEY | API authentication key |
base_url | string | api.together.ai | API base URL |
timeout | integer | 60 | Request timeout in seconds |
max_retries | integer | 3 | Maximum retry attempts |
Retry Configuration
MAX_RETRIES = 3
INITIAL_RETRY_DELAY = 0.5 # seconds
MAX_RETRY_DELAY = 2.0 # seconds
MAX_CONNECTION_RETRIES = 2
MAX_SESSION_LIFETIME_SECS = 300
Source: src/together/abstract/api_requestor.py:20-40
Known Limitations and Issues
Dependency Conflicts
The SDK has a dependency on typer<0.16.0, which may cause conflicts with projects requiring newer versions of typer. This is a known issue tracked in #348.
Variable Scope Issue
A known UnboundLocalError issue can occur in certain error scenarios when the result variable is referenced before assignment. This is being tracked in #143.
Best Practices
Efficient Usage
- Use Streaming for Long Outputs: When expecting long completions, use streaming to improve perceived latency
- Batch Requests with Async: Use
AsyncTogetherfor parallel API calls - Set Appropriate Limits: Configure
max_tokensto prevent excessive generation
Production Considerations
- Implement Retry Logic: The SDK handles retries, but implement additional logic for critical operations
- Monitor Token Usage: Track usage via response
usagefield - Use Safety Models: Enable moderation for user-facing applications
See Also
- Chat Completions API - Modern chat-based completion API
- Fine-tuning Guide - Training custom models
- Models Documentation - Available models and selection
- API Reference - Complete API documentation
Source: https://github.com/togethercomputer/together-python / Human Manual
Embeddings and Reranking
Related topics: Chat Completions, Files API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chat Completions, Files API
Embeddings and Reranking
The Together Python SDK provides first-class support for text embeddings and document reranking through dedicated resource classes. These features enable semantic search, document retrieval, and information discovery workflows by converting text into dense vector representations and reordering search results based on relevance.
Overview
Embeddings and reranking are complementary capabilities that power modern retrieval-augmented generation (RAG) and search systems. The SDK exposes these through the embeddings and rerank namespaces on the main Together client, following a consistent pattern with other API resources like chat completions.
graph LR
A[Text Input] --> B[Embeddings API]
B --> C[Vector Embeddings]
C --> D[Reranking API]
D --> E[Re-ranked Results]
F[Query] --> D
G[Document Pool] --> DKey characteristics:
- Both endpoints use the same
Togetherclient instance - Responses are returned as Pydantic model objects for type safety
- Both support synchronous and async patterns via
TogetherandAsyncTogether - Input text requires newline normalization for optimal results
Embeddings
Purpose and Use Cases
The Embeddings API converts text into high-dimensional vector representations that capture semantic meaning. These vectors can be stored in vector databases and used for similarity search, clustering, or as features for downstream ML tasks.
Common use cases include:
- Semantic search systems
- Document clustering and categorization
- Recommendation systems
- Duplicate detection
- Feature extraction for classification tasks
Python Client Usage
from typing import List
from together import Together
client = Together()
def get_embeddings(texts: List[str], model: str) -> List[List[float]]:
# Normalize newlines as recommended by the SDK
texts = [text.replace("\n", " ") for text in texts]
outputs = client.embeddings.create(model=model, input=texts)
# Extract embedding vectors in order
return [outputs.data[i].embedding for i in range(len(texts))]
# Example usage
input_texts = ["Our solar system orbits the Milky Way galaxy at about 515,000 mph"]
embeddings = get_embeddings(
input_texts,
model="togethercomputer/m2-bert-80M-8k-retrieval"
)
print(embeddings)
Embeddings Response Model
The EmbeddingsCreateResponse model provides structured access to API responses:
| Field | Type | Description |
|---|---|---|
object | str | Object type, typically "list" |
data | List[Embedding] | List of embedding objects |
model | str | Model used for embeddings |
usage | EmbeddingUsage | Token usage statistics |
Each Embedding object contains:
| Field | Type | Description |
|---|---|---|
object | str | Object type, typically "embedding" |
embedding | List[float] | The embedding vector |
index | int | Position in the input list |
The EmbeddingUsage object tracks:
| Field | Type | Description |
|---|---|---|
prompt_tokens | int | Tokens in the input |
total_tokens | int | Total tokens processed |
API Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | str | Yes | - | Embedding model identifier |
input | Union[str, List[str]] | Yes | - | Text(s) to embed |
Available Embedding Models
The SDK works with embedding models available on the Together platform. Common models include:
togethercomputer/m2-bert-80M-8k-retrieval- 8K context, 80M parameterstogethercomputer/m2-bert-80M-2k-retrieval- 2K context, 80M parameters
Model availability can be queried using:
models = client.models.list()
# Filter for embedding models
Reranking
Purpose and Use Cases
The Reranking API takes a query and a set of documents, then returns those documents reordered by relevance to the query. This is particularly valuable when combined with embeddings-based retrieval to refine initial search results.
Common use cases include:
- Improving search result quality after initial embedding-based retrieval
- Multi-stage retrieval pipelines
- Reordering candidates from vector similarity search
- Question answering systems retrieving relevant context
Python Client Usage
from typing import List
from together import Together
client = Together()
def get_reranked_documents(
query: str,
documents: List[str],
model: str,
top_n: int = 3
) -> List[str]:
outputs = client.rerank.create(
model=model,
query=query,
documents=documents,
top_n=top_n
)
# Sort by relevance score and return original documents
return [
documents[i]
for i in sorted(
[x.index for x in outputs.results],
key=lambda x: outputs.results[x].relevance_score,
reverse=True
)
]
# Example usage
query = "What is the capital of the United States?"
documents = ["New York", "Washington, D.C.", "Los Angeles"]
reranked = get_reranked_documents(query, documents, top_n=3)
print(reranked) # ["Washington, D.C.", "New York", "Los Angeles"]
Reranking Response Model
The RerankResponse model provides structured access to reranking results:
| Field | Type | Description |
|---|---|---|
id | str | Request identifier |
results | List[Ranking] | List of ranked documents |
meta | RerankMeta | Metadata including model and usage |
object | str | Object type |
Each Ranking object contains:
| Field | Type | Description |
|---|---|---|
index | int | Original document index |
relevance_score | float | Relevance score (higher = more relevant) |
document | Document | The document object with text |
The Document object:
| Field | Type | Description |
|---|---|---|
text | str | Document text content |
The RerankMeta object:
| Field | Type | Description |
|---|---|---|
model_id | str | Model used for reranking |
usage | RerankUsage | Token usage statistics |
API Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | str | Yes | - | Reranking model identifier |
query | str | Yes | - | The query to rank documents against |
documents | List[str] | Yes | - | Documents to be ranked |
top_n | int | No | 3 | Number of top results to return |
max_chunks_per_doc | int | No | None | Max chunks per document (model-dependent) |
return_documents | bool | No | True | Whether to include document text in response |
Combined Workflow
A typical retrieval pipeline combines embeddings and reranking:
graph TD
A[User Query] --> B[Embed Query]
C[Document Corpus] --> D[Embed All Documents]
B --> E[Vector Similarity Search]
D --> E
E --> F[Candidate Documents]
F --> G[Rerank with Query]
G --> H[Final Results]
I[Vector Database] <--> DComplete Example
from typing import List
from together import Together
client = Together()
EMBEDDING_MODEL = "togethercomputer/m2-bert-80M-8k-retrieval"
RERANK_MODEL = "BAAI/bge-reranker"
def semantic_search(
query: str,
documents: List[str],
embedding_model: str = EMBEDDING_MODEL,
rerank_model: str = RERANK_MODEL,
top_k: int = 10,
final_k: int = 3
) -> List[dict]:
"""
Combined embeddings + reranking search pipeline.
"""
# Step 1: Embed the query
query_embedding = client.embeddings.create(
model=embedding_model,
input=query.replace("\n", " ")
).data[0].embedding
# Step 2: Embed all documents
doc_embeddings = client.embeddings.create(
model=embedding_model,
input=[doc.replace("\n", " ") for doc in documents]
)
# Step 3: Simple cosine similarity (for demonstration)
# In production, use a proper vector database
similarities = []
for i, doc_emb in enumerate(doc_embeddings.data):
similarity = sum(q * d for q, d in zip(query_embedding, doc_emb.embedding))
similarities.append((i, similarity))
# Sort by similarity and take top_k
similarities.sort(key=lambda x: x[1], reverse=True)
candidate_indices = [idx for idx, _ in similarities[:top_k]]
candidate_docs = [documents[i] for i in candidate_indices]
# Step 4: Rerank candidates
rerank_results = client.rerank.create(
model=rerank_model,
query=query,
documents=candidate_docs,
top_n=final_k
)
# Step 5: Extract final results with scores
results = []
for result in rerank_results.results:
results.append({
"document": result.document.text,
"relevance_score": result.relevance_score,
"original_index": result.index
})
return results
# Usage
query = "machine learning optimization techniques"
corpus = [
"Gradient descent is a first-order iterative optimization algorithm.",
"The capital of France is Paris.",
"Stochastic gradient descent uses random subsets of data.",
"Climate change affects global weather patterns.",
"Adam optimizer combines momentum and RMSprop concepts."
]
results = semantic_search(query, corpus)
for r in results:
print(f"Score: {r['relevance_score']:.4f} - {r['document']}")
Async Usage
Both embeddings and reranking support asynchronous operations:
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
async def async_embeddings():
tasks = [
async_client.embeddings.create(
model="togethercomputer/m2-bert-80M-8k-retrieval",
input=texts
)
for texts in batched_documents
]
results = await asyncio.gather(*tasks)
return results
async def async_rerank():
return await async_client.rerank.create(
model="BAAI/bge-reranker",
query="What is deep learning?",
documents=["Doc 1", "Doc 2", "Doc 3"],
top_n=3
)
# Run
asyncio.run(async_embeddings())
asyncio.run(async_rerank())
CLI Support
The CLI provides commands for embeddings and reranking operations:
# Embeddings via CLI (using completions with embeddings model)
together completions \
"Our solar system orbits the Milky Way galaxy" \
--model togethercomputer/m2-bert-80M-8k-retrieval
Note: Direct CLI commands for embeddings may require specific model configurations. For full reranking CLI support, use the Python API.
Error Handling
Both resources can raise standard Together exceptions defined in src/together/error.py:
| Error Type | Description |
|---|---|
TogetherException | Base exception class |
RateLimitError | API rate limit exceeded |
APIConnectionError | Network connectivity issues |
from together import Together
from together.error import TogetherException, RateLimitError
client = Together()
try:
response = client.embeddings.create(
model="togethercomputer/m2-bert-80M-8k-retrieval",
input="Sample text"
)
except RateLimitError:
print("Rate limit exceeded. Please wait before retrying.")
except TogetherException as e:
print(f"API error: {e}")
Input Text Normalization
The SDK documentation recommends normalizing newline characters in input text:
# Recommended: normalize input text
normalized_texts = [text.replace("\n", " ") for text in texts]
# Create embeddings
response = client.embeddings.create(
model="togethercomputer/m2-bert-80M-8k-retrieval",
input=normalized_texts
)
This normalization helps ensure consistent embedding quality across varied text inputs.
Known Limitations
Based on community feedback and issue tracking:
- Model availability: Embedding and reranking model availability may vary. Always verify model identifiers against the Together model marketplace.
- Batch sizes: Large batches of documents may require multiple API calls. Consider batching strategies for large document collections.
- Token limits: Both APIs have token limits that may restrict single-request document counts. Monitor
usagefields in responses.
See Also
- Chat Completions - Interactive text generation
- Fine-tuning - Custom model training
- Image Generation - Image creation capabilities
- Together API Documentation - Platform-level API reference
- Contributing Guide - SDK contribution guidelines
Source: https://github.com/togethercomputer/together-python / Human Manual
Image Generation
Related topics: Chat Completions
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chat Completions
Image Generation
The Image Generation module in together-python provides programmatic access to Together AI's image synthesis API, enabling developers to generate images from text prompts using state-of-the-art diffusion models. This module supports both synchronous and asynchronous requests, includes a comprehensive CLI interface, and returns images in multiple formats suitable for various downstream applications.
Overview
The Image Generation feature is part of the Together AI Python SDK that abstracts the complexity of API communication and response parsing. It allows developers to:
- Generate images from text prompts using supported diffusion models
- Configure generation parameters such as dimensions, steps, and seed
- Support negative prompts to guide generation away from unwanted elements
- Return images as Base64-encoded data or URLs
- Integrate seamlessly with other SDK features like chat completions and embeddings
Image generation is accessed through the client.images namespace in the main Together client, following a consistent pattern used throughout the SDK. Source: src/together/resources/images.py:1-50
Architecture
Component Overview
The image generation system consists of several interconnected components that work together to provide a unified interface:
graph TD
A[User Code] --> B[Together Client]
B --> C[Images Resource]
C --> D[APIRequestor]
D --> E[Together API]
E --> F[ImageResponse]
F --> G[User Code]
H[CLI Command] --> C
I[ImageCLI] --> B
J[ImageRequest Type] --> C
K[ImageResponse Type] --> FModule Structure
| Component | File Path | Purpose |
|---|---|---|
| Images Resource | src/together/resources/images.py | Main API client for image generation |
| Image Types | src/together/types/images.py | Pydantic models for request/response validation |
| CLI Module | src/together/cli/api/images.py | Command-line interface for image generation |
| File Utils | src/together/utils/files.py | Helper utilities for file operations |
Source: src/together/resources/images.py
API Reference
Client Method: `client.images.generate()`
The primary method for generating images. Supports both synchronous and asynchronous operation modes.
Signature:
async def generate(
self,
prompt: str,
model: str,
*,
seed: Optional[int] = None,
n: int = 1,
height: int = 1024,
width: int = 1024,
negative_prompt: Optional[str] = None,
**kwargs,
) -> ImageResponse
#### Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | Required | Text description of the desired image |
model | str | Required | Model identifier (e.g., stabilityai/stable-diffusion-xl-base-1.0) |
seed | int | None | Random seed for reproducible generation |
n | int | 1 | Number of images to generate |
height | int | 1024 | Output image height in pixels |
width | int | 1024 | Output image width in pixels |
negative_prompt | str | None | Prompt describing elements to avoid |
**kwargs | Any | N/A | Additional model-specific parameters |
Source: src/together/resources/images.py:36-60
#### Returns
| Field | Type | Description |
|---|---|---|
data | List[ImageChoicesData] | List of generated image objects |
data[0].b64_json | str | Base64-encoded PNG image data |
data[0].url | str | Remote URL to the generated image (if available) |
data[0].revised_prompt | str | Prompt revised by the model's safety filter |
Source: src/together/types/images.py
Response Type: `ImageResponse`
The ImageResponse object wraps the API response with additional metadata:
class ImageResponse(TogetherBaseResponse):
data: List[ImageChoicesData]
Source: src/together/types/images.py
Usage Patterns
Basic Synchronous Usage
from together import Together
client = Together()
response = client.images.generate(
prompt="space robots",
model="stabilityai/stable-diffusion-xl-base-1.0",
steps=10,
n=4,
)
# Access base64-encoded images
for image_data in response.data:
print(image_data.b64_json)
# Access revised prompt (if modified by safety filter)
for image_data in response.data:
if image_data.revised_prompt:
print(f"Revised prompt: {image_data.revised_prompt}")
Source: README.md
Using Seed for Reproducibility
from together import Together
client = Together()
# Generate with a fixed seed for reproducible results
response = client.images.generate(
prompt="a serene mountain landscape at sunset",
model="stabilityai/stable-diffusion-xl-base-1.0",
seed=42,
n=1,
height=768,
width=768,
)
Multiple Images in Single Request
from together import Together
client = Together()
response = client.images.generate(
prompt="a bowl of fresh fruit",
model="stabilityai/stable-diffusion-xl-base-1.0",
n=4, # Generate 4 variations
width=512,
height=512,
)
# Process each generated image
for idx, image_data in enumerate(response.data):
# Save each image to disk
import base64
image_bytes = base64.b64decode(image_data.b64_json)
with open(f"generated_image_{idx}.png", "wb") as f:
f.write(image_bytes)
CLI Interface
The Together CLI provides a convenient interface for image generation without writing Python code.
Command Structure
together images generate "prompt text" --model <MODEL_NAME> [OPTIONS]
Source: src/together/cli/api/images.py:1-30
CLI Options
| Option | Type | Default | Description |
|---|---|---|---|
--model | str | Required | Model name to use for generation |
--steps | int | 20 | Number of diffusion steps |
--seed | int | None | Random seed for reproducibility |
--n | int | 1 | Number of images to generate |
--height | int | 1024 | Image height in pixels |
--width | int | 1024 | Image width in pixels |
--negative-prompt | str | None | Elements to avoid in generation |
--output | path | . | Output directory for generated images |
--prefix | str | image- | Filename prefix for saved images |
--no-show | flag | False | Do not open images in viewer |
Source: src/together/cli/api/images.py:31-70
CLI Usage Examples
Basic image generation:
together images generate "space robots" \
--model stabilityai/stable-diffusion-xl-base-1.0 \
--n 4
Custom dimensions with reproducible seed:
together images generate "mountain landscape" \
--model stabilityai/stable-diffusion-xl-base-1.0 \
--seed 12345 \
--width 512 \
--height 768 \
--steps 30
Save to specific directory without viewing:
together images generate "abstract art" \
--model stabilityai/stable-diffusion-xl-base-1.0 \
--output ./generated_images \
--prefix "artwork-" \
--no-show
Image Display Behavior
By default, the CLI automatically opens generated images in the system's default image viewer using the PIL.Image library. This behavior can be disabled with the --no-show flag. Source: src/together/cli/api/images.py:70-90
Supported Models
The Together AI platform supports various image generation models. The SDK allows any compatible model identifier to be passed directly:
| Model Family | Example Model Identifier | Typical Use |
|---|---|---|
| Stable Diffusion XL | stabilityai/stable-diffusion-xl-base-1.0 | General purpose generation |
| Flux | black-forest-labs/FLUX.1-dev | High-quality artistic generation |
| Playground | playgroundai/playground-v2.5 | Versatile creative work |
Source: README.md
To list all available image generation models programmatically:
from together import Together
client = Together()
models = client.models.list()
# Filter for image models
image_models = [m for m in models.data if m.type == "image"]
for model in image_models:
print(f"{model.display_name}: {model.name}")
Request/Response Flow
sequenceDiagram
participant User
participant Client
participant ImagesResource
participant APIRequestor
participant TogetherAPI
participant ImageResponse
User->>Client: client.images.generate(...)
Client->>ImagesResource: generate(prompt, model, ...)
ImagesResource->>ImageRequest: Create ImageRequest
ImagesResource->>APIRequestor: arequest(POST /images/generations)
APIRequestor->>TogetherAPI: HTTP POST Request
TogetherAPI-->>APIRequestor: JSON Response
APIRequestor-->>ImagesResource: TogetherResponse
ImagesResource->>ImageResponse: Parse response data
ImageResponse-->>User: ImageResponse with image data
Note over User,TogetherAPI: Base64 images available in response.data[].b64_jsonSource: src/together/resources/images.py:40-70
Common Issues and Troubleshooting
Pillow Version Compatibility
Some users have reported transitive dependency conflicts with the pillow library. The SDK depends on specific pillow versions for image handling and display features in the CLI. If you encounter conflicts with other packages requiring newer pillow versions, consider using separate virtual environments. Source: GitHub Issue #237
Large Image Base64 Handling
Generated images are returned as Base64-encoded strings in b64_json field. When processing large images or multiple images, ensure your application has sufficient memory available. The SDK does not impose a maximum size limit, but the Together API limits images to approximately 10MB when using base64-encoded format. Source: src/together/utils/files.py
API Key Configuration
Image generation requires a valid Together API key. Ensure the TOGETHER_API_KEY environment variable is set or passed directly to the client:
# Via environment variable
# export TOGETHER_API_KEY=your_api_key
client = Together() # Reads from environment
# Or explicitly
client = Together(api_key="your_api_key")
Rate Limiting
Like other API endpoints, image generation is subject to rate limits. If you encounter RateLimitError, implement exponential backoff in your application:
import time
from together import Together
from together.error import RateLimitError
client = Together()
max_retries = 3
for attempt in range(max_retries):
try:
response = client.images.generate(
prompt="your prompt",
model="stabilityai/stable-diffusion-xl-base-1.0"
)
break
except RateLimitError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt
time.sleep(wait_time)
else:
raise
Source: src/together/error.py:40-55
Error Handling
The SDK provides specific exception types for various error conditions:
| Exception Type | Description |
|---|---|
TogetherException | Base exception for all SDK errors |
RateLimitError | API rate limit exceeded |
APIConnectionError | Network connectivity issues |
Timeout | Request timeout |
Source: src/together/error.py
Example error handling:
from together import Together
from together.error import TogetherException, RateLimitError, Timeout
client = Together()
try:
response = client.images.generate(
prompt="test image",
model="stabilityai/stable-diffusion-xl-base-1.0"
)
except RateLimitError:
print("Rate limit exceeded. Please wait before retrying.")
except Timeout:
print("Request timed out. The image may be complex - try with fewer steps.")
except TogetherException as e:
print(f"API error: {e}")
Best Practices
1. Optimize Image Dimensions
For faster generation, use smaller dimensions initially and upscale if needed:
# Faster initial generation
response = client.images.generate(
prompt="landscape",
model="stabilityai/stable-diffusion-xl-base-1.0",
height=512,
width=512,
steps=20, # Fewer steps for draft
)
2. Use Seeds for Iteration
When refining a concept, use a fixed seed to maintain consistency:
base_seed = 42
# Generate variations while maintaining composition
for i in range(4):
response = client.images.generate(
prompt=f"landscape with {'spring' if i % 2 == 0 else 'autumn'} colors",
model="stabilityai/stable-diffusion-xl-base-1.0",
seed=base_seed,
)
3. Batch Generation
Generate multiple images in a single request when possible for efficiency:
response = client.images.generate(
prompt="concept variations",
model="stabilityai/stable-diffusion-xl-base-1.0",
n=4, # Single API call for 4 images
)
4. Handle Revised Prompts
The API may modify prompts for safety reasons. Always check for revised prompts:
response = client.images.generate(
prompt="your prompt here",
model="stabilityai/stable-diffusion-xl-base-1.0",
)
for image_data in response.data:
if image_data.revised_prompt and image_data.revised_prompt != prompt:
print(f"Prompt was revised to: {image_data.revised_prompt}")
See Also
- Chat Completions - Text generation with LLMs
- Embeddings - Text vectorization
- Fine-tuning - Custom model training
- Together AI Documentation - Official platform documentation
Source: https://github.com/togethercomputer/together-python / Human Manual
Files API
Related topics: Fine-Tuning
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Fine-Tuning
Files API
The Files API provides capabilities for uploading, managing, and validating training datasets for use with Together AI's fine-tuning services. It serves as the foundation for preparing training data that powers model customization workflows.
Overview
The Files API enables developers to:
- Upload training datasets in JSONL format for fine-tuning jobs
- Validate file content locally before uploading to catch formatting errors early
- Manage remote files (list, retrieve, delete) on Together's infrastructure
- Support multimodal content including text and image data for vision model training
Source: README.md
Architecture
graph TD
A[User Code / CLI] --> B[Files API Client]
B --> C[FileManager]
C --> D[Together API]
E[Local Validation] --> B
E --> F[files.py utils]
F --> G[JSONL Parser]
G --> H[Content Validators]
I[Fine-tuning] --> D
I --> C
style D fill:#e1f5fe
style C fill:#fff3e0
style F fill:#f3e5f5Component Overview
| Component | File | Responsibility |
|---|---|---|
Together client | filemanager.py | Main API entry point |
FileManager | filemanager.py | Handles file operations |
files.py utils | utils/files.py | Local validation and parsing |
| CLI commands | cli/api/files.py | Command-line interface |
Source: src/together/filemanager.py
File Validation
The SDK provides robust local validation capabilities through the files.py utility module. This validation runs before uploads to catch formatting errors early, preventing failed fine-tuning jobs due to malformed data.
Validation Rules
The validator checks multiple aspects of your JSONL files:
| Validation Rule | Description | Error Type |
|---|---|---|
content field type | Must be a list of dicts | InvalidFileFormatError |
type field presence | Each item must have a type field | InvalidFileFormatError |
| Text content | For type: "text", must have valid text string | InvalidFileFormatError |
| Image content | For type: "image_url", must have valid image_url dict | InvalidFileFormatError |
| Image size | Base64 images must be under 10MB | InvalidFileFormatError |
| Image limit | Maximum 10 images per example | InvalidFileFormatError |
| Image role | Images only allowed in user messages | InvalidFileFormatError |
Source: src/together/utils/files.py
Supported Content Types
# Text content
{"type": "text", "text": "The training prompt here"}
# Image URL content
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
# Base64 image content
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
Multimodal Dataset Structure
The validator supports multimodal datasets for vision model fine-tuning:
graph LR
A[JSONL Line] --> B{Parse content}
B -->|List| C[Validate each item]
B -->|String| D[Plain text]
C --> E{type == "text"?}
C --> F{type == "image_url"?}
E -->|Yes| G[Validate text field]
F -->|Yes| H[Validate image_url dict]
F -->|No| I[Error: Unknown type]
H --> J{URL or Base64?}
J -->|Base64| K[Check size < 10MB]
K --> L[Count images]
J -->|URL| LSource: src/together/utils/files.py
Python Client Usage
Initialization
from together import Together
client = Together()
The client automatically reads the TOGETHER_API_KEY environment variable. You can also pass the key explicitly:
client = Together(api_key="your-api-key-here")
File Operations
#### Upload a File
response = client.files.upload(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
print(response.id)
#### List Files
files = client.files.list()
for file in files.data:
print(f"ID: {file.id}, Filename: {file.filename}, Size: {file.bytes}")
#### Retrieve File Metadata
file_info = client.files.retrieve(file_id="file-xxxxx")
print(f"Created: {file_info.created_at}")
print(f"Filename: {file_info.filename}")
#### Retrieve File Content
content = client.files.retrieve_content(file_id="file-xxxxx")
print(content)
#### Delete a File
result = client.files.delete(file_id="file-xxxxx")
print(result.deleted)
Source: src/together/filemanager.py
CLI Usage
The together files command provides a command-line interface for file operations.
Command Overview
together files --help
| Command | Description |
|---|---|
together files check | Validate a local file before uploading |
together files upload | Upload a file to Together AI |
together files list | List all uploaded files |
together files retrieve | Get file metadata |
together files retrieve-content | Download file content |
together files delete | Delete a remote file |
Source: README.md
Check File (Local Validation)
Validate your JSONL file locally before uploading:
together files check example.jsonl
This runs the same validation logic that the SDK uses, checking:
- JSONL format validity
- Content structure
- Multimodal content rules
- Image size limits
Upload a File
together files upload example.jsonl
List Files
together files list
Retrieve File Metadata
together files retrieve file-6f50f9d1-5b95-416c-9040-0799b2b4b894
Retrieve File Content
together files retrieve-content file-6f50f9d1-5b95-416c-9040-0799b2b4b894
Delete a Remote File
together files delete file-6f50f9d1-5b95-416c-9040-0799b2b4b894
Data Flow for Fine-tuning
The Files API integrates directly with the Fine-tuning API. Here's how files flow through the system:
sequenceDiagram
participant User
participant CLI as Files CLI
participant SDK as Python SDK
participant API as Together API
participant FT as Fine-tuning |
User->>CLI: together files upload data.jsonl
CLI->>SDK: client.files.upload()
SDK->>SDK: Validate locally
SDK->>API: POST /v1/files
API-->>SDK: {id: "file-xxxxx"}
SDK-->>CLI: File upload response
User->>CLI: together fine-tuning create
CLI->>SDK: client.fine_tuning.create(training_file="file-xxxxx")
SDK->>API: POST /v1/fine_tuning/jobs
API-->>SDK: {id: "ft-xxxxx"}
SDK-->>CLI: Fine-tuning job responseError Handling
The SDK defines several exception types for file-related errors:
| Exception | Use Case |
|---|---|
TogetherException | Base exception class |
FileTypeError | Invalid file type or format |
APIConnectionError | Network connectivity issues |
Timeout | Request timeout |
Source: src/together/error.py
Handling Upload Errors
from together import Together
from together.error import FileTypeError, APIConnectionError
client = Together()
try:
response = client.files.upload(
file=open("data.jsonl", "rb"),
purpose="fine-tune"
)
except FileTypeError as e:
print(f"Invalid file format: {e}")
except APIConnectionError as e:
print(f"Connection error: {e}")
Common Issues
File Format Validation Failures
The local validation (together files check) should be run before uploading. This catches the most common issues:
- Missing
typefield: Every content item must have atypefield - Invalid
typevalue: Must be either"text"or"image_url" - Missing
textfield: Text items must have atextstring field - Image in non-user message: Images are only allowed in user roles
- Base64 size exceeded: Images must be under 10MB when base64-encoded
Fine-tuning Integration
Files uploaded via the Files API can be used in fine-tuning jobs:
from together import Together
client = Together()
# Upload training file
training_file = client.files.upload(
file=open("train.jsonl", "rb"),
purpose="fine-tune"
)
# Create fine-tuning job with uploaded file
job = client.fine_tuning.create(
training_file=training_file.id,
model="meta-llama/Llama-4-Scout-17B-16E-Instruct"
)
Source: src/together/resources/finetune.py
Configuration Options
File Upload Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file-like object | Yes | File to upload |
purpose | string | Yes | Intended use (e.g., "fine-tune") |
File Check Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file_path | string | Yes | Path to local file |
Best Practices
- Always validate locally first: Run
together files checkbefore uploading to catch format errors early - Use descriptive filenames: Makes files easier to identify in the file list
- Check file size: Large files may take longer to upload and process
- Verify JSONL format: Ensure each line is valid JSON
- Test with small dataset first: Validate your pipeline with a subset before full upload
See Also
- Fine-tuning Guide - Complete fine-tuning workflow using uploaded files
- Chat Completions - Using models after fine-tuning
- CLI Reference - Complete CLI documentation
Source: https://github.com/togethercomputer/together-python / Human Manual
Fine-Tuning
Related topics: Files API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Files API
Fine-Tuning
The Fine-Tuning module in the Together Python SDK provides a comprehensive interface for customizing foundation models on the Together Inference API. This module enables developers to adapt pre-trained models to their specific use cases through supervised fine-tuning, LoRA (Low-Rank Adaptation), and advanced alignment methods like DPO (Direct Preference Optimization).
Overview
Fine-tuning transforms a pre-trained model into a specialized tool tailored for specific tasks, domains, or behaviors. The Together platform supports multiple fine-tuning methodologies:
| Training Method | Description | Use Case |
|---|---|---|
| Full Training | Updates all model weights | Maximum customization, larger datasets |
| LoRA | Low-Rank Adaptation with adapter weights | Efficient fine-tuning, lower compute costs |
| DPO | Direct Preference Optimization | Alignment and preference learning |
| RPO | Relative Preference Optimization | Alternative alignment approach |
| SimPO | Simple Preference Optimization | Simplified alignment without reference model |
Source: src/together/resources/finetune.py
Architecture
The fine-tuning system follows a layered architecture with the FineTuning class serving as the primary interface:
graph TD
A[User Application] --> B[Together Client]
B --> C[FineTuning Class]
C --> D[APIRequestor]
D --> E[Together Inference API]
F[CLI Commands] --> C
G[Legacy API] --> C
H[File Validation] --> C
I[Checkpoint Management] --> C
J[Price Estimation] --> CCore Components
| Component | Location | Purpose |
|---|---|---|
FineTuning | resources/finetune.py | Main API interface for fine-tuning operations |
FineTuneCreateRequest | types/finetune.py | Request payload model for job creation |
| CLI Commands | cli/api/finetune.py | Command-line interface for fine-tuning |
| Legacy API | legacy/finetune.py | Backward-compatible wrapper functions |
| File Validation | utils/files.py | Dataset file format validation |
Creating Fine-Tuning Jobs
Python Client
The FineTuning.create() method initiates a new fine-tuning job. The method accepts numerous parameters to customize the training process:
from together import Together
client = Together()
response = client.fine_tuning.create(
model="meta-llama/Llama-3-8b-hf",
training_file="file-abc123",
validation_file="file-def456",
n_epochs=3,
batch_size=4,
learning_rate=1e-5,
suffix="my-custom-model",
wandb_api_key="your-wandb-key",
wandb_project_name="my-project",
)
print(response)
Source: src/together/resources/finetune.py
Supported Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | Required | Base model identifier (e.g., meta-llama/Llama-3-8b-hf) |
training_file | str | Required | Uploaded training file ID |
validation_file | str | Optional | Uploaded validation file ID |
n_epochs | int | 3 | Number of training epochs |
n_checkpoints | int | 1 | Number of checkpoints to save |
batch_size | int | Auto | Training batch size |
learning_rate | float | 1e-5 | Initial learning rate |
lr_scheduler_type | str | cosine | Learning rate scheduler |
warmup_ratio | float | 0.1 | Warmup ratio for learning rate |
weight_decay | float | 0.01 | Weight decay coefficient |
max_grad_norm | float | 1.0 | Maximum gradient norm |
suffix | str | None | Custom suffix for output model name |
lora | bool | False | Enable LoRA fine-tuning |
lora_r | int | 8 | LoRA attention dimension |
lora_dropout | float | 0.05 | LoRA dropout probability |
lora_alpha | int | 16 | LoRA alpha parameter |
train_on_inputs | bool | None | Mask user messages in training |
train_vision | bool | False | Train vision encoder (multimodal models) |
training_method | str | sft | Training method (dpo, rpo, simpo) |
from_checkpoint | str | None | Resume from previous job checkpoint |
from_hf_model | str | None | HuggingFace model to continue training from |
Source: src/together/resources/finetune.py
Async Support
For asynchronous applications, use AsyncTogether with the async FineTuning methods:
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
async def create_ft_job():
response = await async_client.fine_tuning.create(
model="meta-llama/Llama-3-8b-hf",
training_file="file-abc123",
n_epochs=3,
)
return response
result = asyncio.run(create_ft_job())
Source: src/together/resources/finetune.py
Managing Fine-Tuning Jobs
Job Lifecycle
stateDiagram-v2
[*] --> Created: create()
Created --> Queued: Submitted
Queued --> Running: Started
Running --> Completed: Success
Running --> Failed: Error
Completed --> Cancelled: cancel()
Queued --> Cancelled: cancel()Listing Jobs
Retrieve all fine-tuning jobs associated with your account:
response = client.fine_tuning.list()
for job in response.data:
print(f"ID: {job.id}, Model: {job.model}, Status: {job.status}")
Retrieving Job Details
Get detailed information about a specific fine-tuning job:
job = client.fine_tuning.retrieve(id="ft-job-abc123")
print(f"Status: {job.status}")
print(f"Training steps: {job.training_steps}")
print(f"Output model: {job.output_name}")
Cancelling Jobs
Abort a running or queued fine-tuning job:
result = client.fine_tuning.cancel(id="ft-job-abc123")
Source: src/together/resources/finetune.py
Checkpoint Management
Checkpoints enable resuming training from intermediate states and retrieving model weights for deployment.
Retrieving Checkpoints
checkpoints = client.fine_tuning.checkpoints(id="ft-job-abc123")
for checkpoint in checkpoints.data:
print(f"Step: {checkpoint.step}, Type: {checkpoint.checkpoint_type}")
The _parse_raw_checkpoints() helper processes raw checkpoint metadata:
parsed_checkpoints = []
for checkpoint in checkpoints:
step = checkpoint["step"]
checkpoint_type = checkpoint["checkpoint_type"]
checkpoint_name = (
f"{id}:{step}" if "intermediate" in checkpoint_type.lower() else id
)
parsed_checkpoints.append(
FinetuneCheckpoint(
type=checkpoint_type,
timestamp=checkpoint["created_at"],
name=checkpoint_name,
)
)
Source: src/together/resources/finetune.py
Download Checkpoints
Download fine-tuned model weights using the CLI:
# Download latest checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
# Download specific checkpoint
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-step 1000
# Download with specific checkpoint type
together fine-tuning download ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --checkpoint-type merged
#### Checkpoint Types
| Type | Description | Applicable Training |
|---|---|---|
default | Default output format | All |
merged | Merged with base model (LoRA only) | LoRA |
adapter | Adapter weights only (LoRA only) | LoRA |
model_output_path | Full model output (Full only) | Full |
Source: src/together/cli/api/finetune.py
Download Options
| CLI Option | Description |
|---|---|
--output_dir, -o | Output directory for downloaded files |
--checkpoint-step, -s | Specific checkpoint step to download |
--checkpoint-type | Checkpoint type (default, merged, adapter) |
result = client.fine_tuning.download(
fine_tune_id="ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b",
output="./model-output",
checkpoint_step=1000,
checkpoint_type=DownloadCheckpointType.MERGED,
)
print(f"Downloaded to: {result.filename}")
CLI Commands
The Together CLI provides a comprehensive set of commands for fine-tuning operations:
Create a Fine-Tuning Job
together fine-tuning create \
--model meta-llama/Llama-3-8b-hf \
--training-file file-abc123 \
--n-epochs 3 \
--suffix my-custom-model
List Fine-Tuning Jobs
together fine-tuning list
Retrieve Job Details
together fine-tuning retrieve ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
Cancel a Job
# With confirmation prompt
together fine-tuning cancel ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
# Force deletion without confirmation
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b --force
Delete a Job
together fine-tuning delete ft-c66a5c18-1d6d-43c9-94bd-32d756425b4b
Source: src/together/cli/api/finetune.py
Weights & Biases Integration
The SDK supports automatic logging to Weights & Biases for experiment tracking:
together fine-tuning create \
--model meta-llama/Llama-3-8b-hf \
--training-file file-abc123 \
--wandb-api-key your-api-key \
--wandb-project-name my-project \
--wandb-name my-experiment-run
| Parameter | Description |
|---|---|
--wandb-api-key | Weights & Biases API key |
--wandb-project-name | W&B project name |
--wandb-name | W&B run name |
--wandb-base-url | W&B base URL (for enterprise deployments) |
Source: src/together/cli/api/finetune.py
File Format Requirements
Training and validation files must follow specific JSONL (JSON Lines) format requirements:
Instruction Tuning Format
{"text": "What is the capital of France?\nAnswer: Paris"}
Chat/Conversation Format
{"content": [{"type": "text", "text": "What is the capital of France?"}], "role": "user"}
{"content": [{"type": "text", "text": "Paris"}], "role": "assistant"}
Multimodal Format (with Images)
{"content": [{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}], "role": "user"}
Validation Rules
The file validation system enforces the following rules:
| Rule | Error | Source |
|---|---|---|
| File must be valid JSONL | InvalidFileFormatError | utils/files.py |
| Content must be a list of dicts | InvalidFileFormatError | utils/files.py |
Each item must have type field | InvalidFileFormatError | utils/files.py |
Text items must have text field (string) | InvalidFileFormatError | utils/files.py |
| Image items must be in user messages only | InvalidFileFormatError | utils/files.py |
Image items must have image_url dict | InvalidFileFormatError | utils/files.py |
Source: src/together/utils/files.py
Error Handling
The fine-tuning module defines specific exception types for different failure scenarios:
Exception Types
| Exception | Use Case |
|---|---|
TogetherException | Base exception class |
RateLimitError | API rate limit exceeded |
FileTypeError | Invalid file format |
APIConnectionError | Network connectivity issues |
Timeout | Request timeout |
Source: src/together/error.py
Error Response Model
from together.types.error import TogetherErrorResponse
error_response = TogetherErrorResponse(
message="Invalid training file format",
type="validation_error",
param="training_file",
code="INVALID_FORMAT"
)
Handling Errors
from together import Together
from together.error import RateLimitError, TogetherException
client = Together()
try:
response = client.fine_tuning.create(
model="meta-llama/Llama-3-8b-hf",
training_file="file-abc123",
)
except RateLimitError:
print("Rate limit exceeded. Please wait and retry.")
except TogetherException as e:
print(f"Fine-tuning error: {e}")
Legacy API
The SDK provides backward-compatible wrapper functions in the legacy module:
from together.legacy import finetune
# These functions are deprecated but still functional
response = finetune.create(
training_file="file-abc123",
model="meta-llama/Llama-3-8b-hf",
n_epochs=3,
)
⚠️ Warning: The legacy functions emit deprecation warnings. Migrate to the new client.fine_tuning interface for new projects.
Source: src/together/legacy/finetune.py
Common Patterns
Resuming from Checkpoint
Continue training from a previous fine-tuning job:
response = client.fine_tuning.create(
model="meta-llama/Llama-3-8b-hf",
training_file="file-abc123",
from_checkpoint="ft-previous-job:1000", # Resume from step 1000
)
Fine-tuning from HuggingFace Model
Start training from a HuggingFace Hub model:
response = client.fine_tuning.create(
model="meta-llama/Llama-3-8b-hf",
training_file="file-abc123",
from_hf_model="username/my-finetuned-model",
hf_model_revision="v1.0",
)
Training with Price Limits
The SDK includes price estimation to prevent unexpected costs:
price_estimation = client.fine_tuning.estimate_price(
training_file="file-abc123",
model="meta-llama/Llama-3-8b-hf",
n_epochs=3,
training_type="lora",
)
if price_estimation.allowed_to_proceed:
response = client.fine_tuning.create(...)
else:
print(f"Estimated cost ${price_estimation.estimated_cost} exceeds limit")
Source: src/together/resources/finetune.py
Price Estimation
The price estimation feature helps users understand the expected cost before starting a fine-tuning job:
graph LR
A[User Creates Job] --> B{from_checkpoint or from_hf_model?}
B -->|No| C[Estimate Price]
B -->|Yes| D[Skip Estimation]
C --> E{Cost within limits?}
E -->|Yes| F[Submit Job]
E -->|No| G[Show Warning]
D --> FPrice estimation is automatically performed when creating jobs without a checkpoint or HuggingFace model source, unless explicitly disabled.
See Also
- Chat Completions - Using fine-tuned models for inference
- Files - Uploading training and validation datasets
- Models - Available base models for fine-tuning
- Getting Started - SDK installation and authentication
- CLI Reference - Complete CLI command documentation
Source: https://github.com/togethercomputer/together-python / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project should not be treated as fully validated until this signal is reviewed.
The project should not be treated as fully validated until this signal is reviewed.
The project should not be treated as fully validated until this signal is reviewed.
The project should not be treated as fully validated until this signal is reviewed.
Doramagic Pitfall Log
Doramagic extracted 12 source-linked risk signals. Review them before installing or handing real data to the project.
1. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:624113979 | https://github.com/togethercomputer/together-python | README/documentation is current enough for a first validation pass.
2. Project risk: v.1.5.31
- Severity: medium
- Finding: Project risk is backed by a source signal: v.1.5.31. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.31
3. Project risk: v.1.5.33
- Severity: medium
- Finding: Project risk is backed by a source signal: v.1.5.33. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.33
4. Project risk: v1.5.28
- Severity: medium
- Finding: Project risk is backed by a source signal: v1.5.28. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.28
5. Maintenance risk: v.1.5.29
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: v.1.5.29. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v.1.5.29
6. Maintenance risk: v1.5.27
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: v1.5.27. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/releases/tag/v1.5.27
7. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | last_activity_observed missing
8. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium
9. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:624113979 | https://github.com/togethercomputer/together-python | no_demo; severity=medium
10. Security or permission risk: `LogProbs.top_logprobs` typed as `Dict` but API returns `List[Dict]`
- Severity: medium
- Finding: Security or permission risk is backed by a source signal:
LogProbs.top_logprobstyped asDictbut API returnsList[Dict]. Treat it as a review item until the current version is checked. - User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/togethercomputer/together-python/issues/443
11. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | issue_or_pr_quality=unknown
12. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:624113979 | https://github.com/togethercomputer/together-python | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using together-python with real data or production workflows.
- [
LogProbs.top_logprobstyped asDictbut API returnsList[Dict]](https://github.com/togethercomputer/together-python/issues/443) - github / github_issue - v1.5.35 - github / github_release
- v.1.5.33 - github / github_release
- v.1.5.31 - github / github_release
- v.1.5.29 - github / github_release
- v1.5.28 - github / github_release
- v1.5.27 - github / github_release
- README/documentation is current enough for a first validation pass. - GitHub / issue
Source: Project Pack community evidence and pitfall evidence