# https://github.com/AI4Finance-Foundation/FinRL Project Manual

Generated at: 2026-06-17 20:06:59 UTC

## Table of Contents

- [FinRL Architecture, Three-Layer Framework & Project Layout](#page-1)
- [Market Environments: StockTradingEnv, Crypto, Portfolio & Variants](#page-2)
- [DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors](#page-3)
- [Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications](#page-4)

<a id='page-1'></a>

## FinRL Architecture, Three-Layer Framework & Project Layout

### Related Pages

Related topics: [Market Environments: StockTradingEnv, Crypto, Portfolio & Variants](#page-2), [DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors](#page-3), [Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md)
- [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md)
- [finrl/agents/elegantrl/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/elegantrl/models.py)
- [examples/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md)
- [finrl/agents/portfolio_optimization/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/portfolio_optimization/README.md)
- [finrl/agents/portfolio_optimization/algorithms.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/portfolio_optimization/algorithms.py)
- [finrl/agents/stablebaselines3/tune_sb3.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/stablebaselines3/tune_sb3.py)
- [finrl/meta/env_portfolio_optimization/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_portfolio_optimization/README.md)
</details>

# FinRL Architecture, Three-Layer Framework & Project Layout

## Overview & Ecosystem Position

FinRL is the original open-source deep reinforcement learning (DRL) framework for finance, positioned as the classic end-to-end research and educational library within the broader AI4Finance ecosystem. It is explicitly distinguished from its successor **FinRL-X / FinRL-Trading**, which is the next-generation AI-native production stack. Source: [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

The ecosystem roadmap identifies four generations of libraries: **FinRL-Meta** (gym-style market environments), **FinRL** (classic end-to-end train-test-trade pipeline), **ElegantRL** (lightweight DRL algorithms), and **FinRL-X** (production-oriented modular infrastructure). Source: [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

## Three-Layer Architecture

The core design follows a three-layer coupled architecture that separates trading tasks, RL algorithms, and market environments, allowing users to plug in any DRL library and play. Source: [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md).

```mermaid
flowchart TB
    A["Applications Layer<br/>(cryptocurrency_trading, stock_trading,<br/>portfolio_allocation, high_frequency_trading)"]
    B["Agents Layer<br/>(elegantrl, stablebaseline3, rllib,<br/>portfolio_optimization)"]
    C["Meta Layer<br/>(data_processors, preprocessor,<br/>env_stock_trading, env_cryptocurrency_trading,<br/>env_portfolio_allocation)"]
    A --> B
    B --> C
    C --> A
```

### Applications Layer

This top layer contains the financial tasks and orchestration scripts. The repository currently ships with `cryptocurrency_trading`, `high_frequency_trading`, `portfolio_allocation`, and `stock_trading` as first-class task folders. The end-to-end train-test-trade pipeline is implemented across `train.py`, `test.py`, `trade.py`, and the entry point `main.py`. Source: [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md).

### Agents Layer

This layer exposes DRL algorithm integrations through three backends plus a dedicated portfolio-optimization stack:

- **ElegantRL** — exposes `AgentDDPG`, `AgentTD3`, `AgentSAC`, `AgentPPO`, and `AgentA2C` through a `MODELS` dictionary, with `OFF_POLICY_MODELS = ["ddpg", "td3", "sac"]` and `ON_POLICY_MODELS = ["ppo"]`. Source: [finrl/agents/elegantrl/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/elegantrl/models.py).
- **Stable Baselines 3** — used by the FinRL Stock Trading 2026 tutorial to train A2C, DDPG, PPO, SAC, and TD3. Source: [examples/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md).
- **RLlib** — production-grade distributed training backend. Source: [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md).
- **Portfolio Optimization agents** — implement a custom `PolicyGradient` algorithm following *Jiang et al*, with optional EIIE convolutional architecture (`k_size` parameter) and online-learning evaluation mode. Source: [finrl/agents/portfolio_optimization/algorithms.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/portfolio_optimization/algorithms.py).

### Meta Layer (Market Environments & Data)

The bottom layer houses Gymnasium-style market environments, data processors, and preprocessors. Environments are merged from the active FinRL-Meta repository, and include `env_stock_trading`, `env_cryptocurrency_trading`, `env_portfolio_allocation`, and `env_portfolio_optimization`. Source: [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md).

The portfolio-optimization environment, for example, expects a dataframe with `date` and `tic` columns and an action space shaped `(n+1,)` representing the cash plus `n` stock allocation percentages. Source: [finrl/meta/env_portfolio_optimization/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_portfolio_optimization/README.md).

The data-processor layer also supports 14+ external sources including Alpaca, Baostock, CCXT, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Tushare, and Yahoo Finance, with OHLCV plus technical indicators. Source: [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

## Project File Structure

The top-level layout mirrors the three-layer architecture. Key files include:

| Path | Purpose |
|------|---------|
| `finrl/main.py` | CLI entry point for `--mode=train` / `test` / `trade` |
| `finrl/config.py` | Global configuration: tickers, dates, indicators, hyperparameters |
| `finrl/config_tickers.py` | Symbol universe definitions (DOW 30, NASDAQ 100, etc.) |
| `finrl/train.py` | Training pipeline wrapper |
| `finrl/test.py` | Backtesting / evaluation |
| `finrl/trade.py` | Live / paper trading orchestration |
| `finrl/plot.py` | Performance visualization |
| `finrl/agents/` | DRL algorithm backends |
| `finrl/meta/` | Environments, preprocessors, data processors |
| `finrl/applications/` | Stock, crypto, portfolio, HFT task scripts |
| `examples/` | Standalone tutorials (e.g., Stock Trading 2026) |
| `unit_tests/` | Environment and downloader tests |

Source: [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md) and [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

## Train-Test-Trade Pipeline & Hyperparameter Tuning

The canonical FinRL workflow, as demonstrated in the v0.3.8 Stock Trading 2026 tutorial, runs in three scripts: data download and preprocessing, DRL agent training (5 algorithms in one pass), and backtesting against baselines such as MVO and DJIA. Source: [examples/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md).

Hyperparameter search is supported via Optuna. The `LoggingCallback` class in `tune_sb3.py` tracks the `previous_best_value` study attribute, prunes trials after a configurable `trial_number` threshold, and applies a `patience` window for Sharpe-ratio improvement. Source: [finrl/agents/stablebaselines3/tune_sb3.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/stablebaselines3/tune_sb3.py).

## Known Limitations & Community-Reported Issues

Several recurring community issues map directly to architectural seams in this three-layer coupling:

- **Gymnasium API drift**: `StockTradingEnv.reset()` no longer accepts the legacy `seed` keyword under Gymnasium 0.28.1, breaking older tutorials. Source: community issue #1013.
- **Off-policy vs. on-policy callback mismatch**: FinRL 0.3.8 logging assumes a `rollout_buffer`, but DDPG/TD3/SAC use a `replay_buffer`, causing training failures. Source: community issue #1395.
- **Threading bug in paper trading**: `paper_trading/alpaca.py` called `submitOrder` immediately instead of passing it as the thread `target` argument. Source: community issues #1399 and #1414.
- **Short-selling controls**: A proposed `allow_short_selling: bool = True` parameter would extend `StockTradingEnv` action bounds. Source: community issue #1255.

These issues reflect the framework's status as a research prototype: the README itself redirects production users to **FinRL-X / FinRL-Trading**, which is described as fully decoupled, type-safe (Pydantic), and production-oriented. Source: [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

## See Also

- [FinRL Stock Trading 2026 Tutorial](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md)
- [FinRL-Meta market environments](https://github.com/AI4Finance-Foundation/FinRL-Meta)
- [ElegantRL algorithm library](https://github.com/AI4Finance-Foundation/ElegantRL)
- [FinRL-X / FinRL-Trading (next generation)](https://github.com/AI4Finance-Foundation/FinRL-Trading)

---

<a id='page-2'></a>

## Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

### Related Pages

Related topics: [FinRL Architecture, Three-Layer Framework & Project Layout](#page-1), [DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors](#page-3), [Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [finrl/meta/env_stock_trading/env_stocktrading.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_stock_trading/env_stocktrading.py)
- [finrl/meta/env_stock_trading/env_stocktrading_np.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_stock_trading/env_stocktrading_np.py)
- [finrl/meta/env_stock_trading/env_stocktrading_cashpenalty.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_stock_trading/env_stocktrading_cashpenalty.py)
- [finrl/meta/env_stock_trading/env_stocktrading_stoploss.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_stock_trading/env_stocktrading_stoploss.py)
- [finrl/meta/env_portfolio_optimization/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/env_portfolio_optimization/README.md)
- [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md)
- [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md)
- [examples/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md)
- [finrl/applications/Stock_NeurIPS2018/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/applications/Stock_NeurIPS2018/README.md)
</details>

# Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

## Overview and Role in the FinRL Stack

FinRL's market environments form the middle layer of its three-layer architecture (Market Environments, DRL Agents, Financial Applications), providing Gymnasium-compatible simulators where reinforcement learning agents learn sequential trading policies. As described in [README.md](README.md), these environments expose OHLCV data, technical indicators, and turbulence indices as state observations, then reward agents based on portfolio value changes after executing continuous or discrete actions.

The environment layer is split across multiple sub-modules, each targeting a financial task. The [finrl/README.md](finrl/README.md) layout shows that `env_stock_trading`, `env_cryptocurrency_trading`, and `env_portfolio_allocation` live under `finrl/meta/`, while the applications layer in `finrl/applications/Stock_NeurIPS2018` ties a specific environment variant (the original 2018 NeurIPS stock-trading env) to a reproducible training pipeline. Users instantiate an environment, wrap it with a DRL agent from `finrl/agents/` (Stable Baselines 3, ElegantRL, or RLlib), then drive it through the canonical `train.py` → `test.py` → `trade.py` pipeline.

The v0.3.8 release (Stock Trading 2026 tutorial) standardizes the workflow on the five DRL agents — A2C, DDPG, PPO, TD3, SAC — and the classic `StockTradingEnv` family. Each of these algorithms has a slightly different interaction with the environment, and several community-reported bugs stem from this interaction (see [Common Failure Modes](#common-failure-modes)).

## Stock Trading Environments and Variants

The most-used environment is `StockTradingEnv` in [finrl/meta/env_stock_trading/env_stocktrading.py](finrl/meta/env_stock_trading/env_stocktrading.py). It accepts a DataFrame containing price, technical-indicator, and turbulence columns plus a list of stock tickers, then exposes a multi-dimensional continuous action space (one action per stock representing target shares or weights). The environment tracks cash, holdings, and portfolio value, computes a reward from the change in portfolio value, and applies transaction-cost penalties.

The repository ships several variants for research experiments:

| Variant | File | Purpose |
|---|---|---|
| `StockTradingEnv` | env_stocktrading.py | Default equity trading env with turbulence-aware risk control |
| NumPy port | env_stocktrading_np.py | Lightweight NumPy reimplementation for faster stepping |
| Cash-penalty variant | env_stocktrading_cashpenalty.py | Penalizes idle cash to encourage full capital deployment |
| Stop-loss variant | env_stocktrading_stoploss.py | Triggers forced exits when a position drawdown threshold is breached |
| Paper-trading env | env_stock_papertrading.py | Live Alpaca integration for paper-account execution |

The NeurIPS 2018 tutorial ([finrl/applications/Stock_NeurIPS2018/README.md](finrl/applications/Stock_NeurIPS2018/README.md)) walks through this env end-to-end: a data notebook produces `train.csv` and `trade.csv`; the training notebook wraps the env with `e_train_gym.get_sb_env()` for Stable Baselines 3; the backtest notebook compares the trained agent against Mean-Variance Optimization and the DJIA benchmark.

State construction concatenates the current balance, holdings vector, closing prices, technical indicators (MACD, Bollinger bands, RSI, DX, 30- and 60-day SMAs, per the [README.md](README.md) feature list), and turbulence into a single observation vector. The reset method, however, is sensitive to the Gym/Gymnasium API split, which is the source of one of the most common user errors (issue #1013).

## Cryptocurrency and Portfolio Allocation Environments

Beyond equities, FinRL provides environments for two other asset classes.

**Cryptocurrency trading** lives in `finrl/meta/env_cryptocurrency_trading/`. It mirrors the stock-trading env but is designed for 24/7 crypto markets with no turbulence gating. CCXT is the canonical data source, as listed in the [README.md](README.md) data-source table, supporting 1-minute OHLCV with exchange-specific request limits.

**Portfolio allocation** lives in `finrl/meta/env_portfolio_allocation/` and follows a different action convention. According to [finrl/meta/env_portfolio_optimization/README.md](finrl/meta/env_portfolio_optimization/README.md), the environment expects a *portfolio vector* — a 1-D `Box` of shape `(n+1,)` where the leading element is the cash weight and the remaining `n` elements are the allocation weights across `n` assets. The input DataFrame requires a `date` column and a `tic` column, and weights are renormalized to sum to 1 after each step.

A separate portfolio-optimization agent set under [finrl/agents/portfolio_optimization/README.md](finrl/agents/portfolio_optimization/README.md) implements architectures such as EI³ (Inception-style CNN for multi-scale temporal features) and the classic Jiang et al. 2017 DPG framework for portfolio management.

```mermaid
flowchart LR
    A[Market Data OHLCV + Indicators] --> B[FinRL Market Environment]
    B --> C[Observation Vector]
    C --> D[DRL Agent SB3 / ElegantRL / RLlib]
    D --> E[Action Continuous / Portfolio Vector]
    E --> B
    B --> F[Reward + Portfolio Value]
```

## Common Failure Modes

Community issues cluster around three recurring failure patterns that any new user of the market environments should be aware of:

1. **Gymnasium API drift (issue #1013).** `StockTradingEnv.reset()` does not accept a `seed` keyword in its current implementation, so calling `env_train.get_sb_env()` under Gymnasium 0.28.1 raises `unexpected keyword argument 'seed'`. Pinning to an older `gym` version or removing the `seed` argument at the call site is the standard workaround.

2. **Off-policy / on-policy buffer mismatch (issue #1395).** FinRL's training callback logs information from `model.rollout_buffer`, which is populated by on-policy algorithms (A2C, PPO). Off-policy algorithms such as DDPG, TD3, and SAC use `replay_buffer` instead, producing an `AttributeError` at log time. The fix is to guard the logging code on the buffer attribute actually present, or branch on `ON_POLICY_MODELS` versus `OFF_POLICY_MODELS` as defined in [finrl/agents/elegantrl/models.py](finrl/agents/elegantrl/models.py).

3. **Short-selling control (issue #1255).** There is no first-class `allow_short_selling` flag in `StockTradingEnv` today. To prevent negative actions, the suggested approaches are to clip actions to non-negative values or to inflate the sell transaction cost in the env constructor so the agent self-discovers that shorting is unprofitable.

Additional reported issues (#671, #206, #222, #696) trace to environment shape mismatches, pyfolio/zipline import errors, and NaN propagation in the observation when technical indicators are computed on insufficient lookback windows. The v0.3.8 tutorial ([examples/README.md](examples/README.md)) sidesteps most of these by fixing the data window (2014–2025 for training, 2026-01-01 to 2026-03-20 for trading) and using a curated indicator set.

## See Also

- [FinRL Repository Overview](README.md)
- [FinRL-Meta: Market Environments and Benchmarks](https://github.com/AI4Finance-Foundation/FinRL-Meta)
- [FinRL-X / FinRL-Trading (next-generation stack)](https://github.com/AI4Finance-Foundation/FinRL-Trading)
- Stock Trading 2026 tutorial scripts under `examples/`
- Stable Baselines 3 integration via `get_sb_env()` helper in agent modules

---

<a id='page-3'></a>

## DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

### Related Pages

Related topics: [FinRL Architecture, Three-Layer Framework & Project Layout](#page-1), [Market Environments: StockTradingEnv, Crypto, Portfolio & Variants](#page-2), [Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [finrl/agents/stablebaselines3/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/stablebaselines3/models.py)
- [finrl/agents/elegantrl/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/elegantrl/models.py)
- [finrl/agents/rllib/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/rllib/models.py)
- [finrl/agents/portfolio_optimization/models.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/portfolio_optimization/models.py)
- [finrl/agents/portfolio_optimization/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/agents/portfolio_optimization/README.md)
- [finrl/applications/Stock_NeurIPS2018/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/applications/Stock_NeurIPS2018/README.md)
- [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md)
- [examples/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/README.md)
- [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md)
</details>

# DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

## Overview

FinRL ships a pluggable agent layer that lets users train Deep Reinforcement Learning (DRL) trading policies using three external libraries. The framework follows a three-layer architecture: **applications** (trading tasks), **agents** (DRL algorithms), and **meta** (market environments). The `agents` layer is the abstraction point where the user picks a backend; the same `StockTradingEnv` or `PortfolioOptimizationEnv` can be reused across all backends.

Source: [finrl/README.md:1-30]()

The official FinRL Stock Trading 2026 tutorial exercises five DRL algorithms on Dow 30 data — `A2C, DDPG, PPO, SAC, TD3` — all backed by Stable Baselines 3 (SB3). Source: [examples/README.md:30-50]().

## Agent Backends

### Stable Baselines 3 (`finrl/agents/stablebaselines3/models.py`)

SB3 is the default and most commonly used backend. The `DRLAgent` class wraps SB3 model classes and exposes `get_model`, `train_model`, `DRL_prediction`, and `DRL_prediction_load_from_file`.

The supported model registry is:

```python
MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}
```

Source: [finrl/agents/stablebaselines3/models.py:18-19]()

Default hyperparameters are pulled from `finrl/config.py` via:

```python
MODEL_KWARGS = {x: config.__dict__[f"{x.upper()}_PARAMS"] for x in MODELS.keys()}
```

Source: [finrl/agents/stablebaselines3/models.py:21]()

Action noise for off-policy methods (DDPG/TD3/SAC) is provided through `NormalActionNoise` and `OrnsteinUhlenbeckActionNoise`. The training loop integrates a `TensorboardCallback` (subclass of SB3's `BaseCallback`) for logging. Source: [finrl/agents/stablebaselines3/models.py:11-16]()

### ElegantRL (`finrl/agents/elegantrl/models.py`)

ElegantRL provides a lightweight, single-file DRL library. The `DRLAgent` class here is initialized with `env`, `price_array`, `tech_array`, and `turbulence_array`, and supports the same five algorithms plus the explicit on-/off-policy classification:

```python
MODELS = {"ddpg": AgentDDPG, "td3": AgentTD3, "sac": AgentSAC,
          "ppo": AgentPPO, "a2c": AgentA2C}
OFF_POLICY_MODELS = ["ddpg", "td3", "sac"]
ON_POLICY_MODELS = ["ppo"]
```

Source: [finrl/agents/elegantrl/models.py:14-21]()

`train_model` delegates to ElegantRL's `train_agent`, while `DRL_prediction` rebuilds an `actor` network from `act.pth` and `Config(agent_class=agent_class, env_class=env_class, env_args=env_args)`. Source: [finrl/agents/elegantrl/models.py:54-92]()

### RLlib (`finrl/agents/rllib/models.py`)

RLlib (Ray) is recommended for distributed, multi-agent, or production-scale training. The model registry mirrors the other backends:

```python
MODELS = {"a2c": a2c, "ddpg": ddpg, "td3": td3, "sac": sac, "ppo": ppo}
```

Source: [finrl/agents/rllib/models.py:8]()

Each algorithm exposes a `*Trainer` (e.g. `PPOTrainer`, `DDPGTrainer`) used inside `DRL_prediction`. The config is built by copying `*_DEFAULT_CONFIG` and injecting `env_config` containing `price_array`, `tech_array`, `turbulence_array`, and `if_train`. Source: [finrl/agents/rllib/models.py:60-90]()

### Portfolio Optimization Agent (`finrl/agents/portfolio_optimization/models.py`)

A separate, dedicated agent for the `PortfolioOptimizationEnv`. It only ships a `PolicyGradient` ("pg") algorithm and uses an `EIIE` convolutional policy architecture. The example in the module README configures `model_kwargs={"lr": 0.01, "policy": EIIE}` and trains for episodes instead of timesteps. Source: [finrl/agents/portfolio_optimization/README.md:10-40]()

## Training Workflow

```mermaid
flowchart LR
    A[Market Data CSV] --> B[Data Processor / Indicators]
    B --> C[Train/Test Split]
    C --> D[StockTradingEnv or PortfolioOptimizationEnv]
    D --> E{Pick Agent Backend}
    E -->|SB3| F1[stable_baselines3]
    E -->|ElegantRL| F2[elegantrl.agents]
    E -->|RLlib| F3[ray.rllib]
    F1 --> G[Trained Model .zip / .pth]
    F2 --> G
    F3 --> G
    G --> H[DRL_prediction]
    H --> I[Backtest vs DJIA / MVO]
```

The split between train and trade is performed by `data_split` from `finrl.meta.preprocessor.preprocessors`, which the SB3 `TensorboardCallback` references. Source: [finrl/agents/stablebaselines3/models.py:13]()

## Common Training Errors

The community issue tracker surfaces several recurring failure modes that map directly to the agent layer.

### 1. `StockTradingEnv.reset() got an unexpected keyword argument 'seed'`

Reported on Google Colab with Gymnasium 0.28.1 in the `Stock_NeurIPS2018_SB3.ipynb` tutorial. The mismatch comes from Gymnasium's stricter `reset(seed=...)` signature, which `env_stocktrading.py` was not yet passing through. The fix is to either pin `gymnasium<0.26` or ensure the env's `reset` accepts a `seed` kwarg. Source: [github.com/AI4Finance-Foundation/FinRL/issues/1013]()

### 2. `rollout_buffer` logging error for off-policy algorithms (DDPG/TD3/SAC)

In FinRL 0.3.8, the `TensorboardCallback` (in `finrl/agents/stablebaselines3/models.py`) records metrics that assume an on-policy `rollout_buffer` exists. Off-policy algorithms instead expose a `replay_buffer`, causing `AttributeError` on the first step. Workarounds include guarding the log with `hasattr(self.model, "rollout_buffer")` or registering a separate callback. Source: [github.com/AI4Finance-Foundation/FinRL/issues/1395]()

### 3. `Normal(loc, scale) invalid values` for single-stock training (issue #696)

A `ValueError: Expected parameter loc ... to satisfy the constraint Real()` is raised when the actor network outputs non-finite log-std on a tiny state space (e.g. one stock). The fix is gradient clipping, lower learning rate, or a bounded action policy. Source: [github.com/AI4Finance-Foundation/FinRL/issues/696]()

### 4. Shape mismatch in `main.py --mode=train` (issues #206, #222)

`cannot copy sequence with size 292 to array axis with dimension 301` and similar `Could not broadcast input array from shape` errors are almost always caused by `df` and `tech_array` having different lengths after indicator preprocessing. Verify that `data_split` is called **after** indicators are added and that the turbulence index is aligned to `df.index`. Source: [github.com/AI4Finance-Foundation/FinRL/issues/206](), [github.com/AI4Finance-Foundation/FinRL/issues/222]()

### 5. `AttributeError` from `main.py` (issue #671)

A `SystemExit(main())` crash on line 152 of `finrl/main.py` typically means `finrl.config` is missing required tickers (e.g. `DOW_30_TICKER`) or the data downloader returned an empty DataFrame. Source: [github.com/AI4Finance-Foundation/FinRL/issues/671]()

## See Also

- Market Environments (`StockTradingEnv`, `PortfolioOptimizationEnv`)
- Data Processors and Technical Indicators
- Paper Trading via Alpaca
- FinRL-X / FinRL-Trading (next-generation stack)

---

<a id='page-4'></a>

## Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

### Related Pages

Related topics: [FinRL Architecture, Three-Layer Framework & Project Layout](#page-1), [Market Environments: StockTradingEnv, Crypto, Portfolio & Variants](#page-2), [DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [finrl/main.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/main.py)
- [finrl/train.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/train.py)
- [finrl/test.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/test.py)
- [finrl/trade.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/trade.py)
- [finrl/config.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config.py)
- [finrl/config_tickers.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config_tickers.py)
- [finrl/plot.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/plot.py)
- [finrl/meta/data_processor.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/data_processor.py)
- [finrl/meta/data_processors/processor_yahoofinance.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/data_processors/processor_yahoofinance.py)
- [finrl/meta/paper_trading/alpaca.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/paper_trading/alpaca.py)
- [examples/FinRL_StockTrading_2026_1_data.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/FinRL_StockTrading_2026_1_data.py)
- [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md)
</details>

# Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

## Overview

FinRL provides a complete train–test–trade workflow that ties together market data ingestion, feature engineering, DRL agent training, backtesting, and (optionally) live paper trading through the Alpaca brokerage. The framework is organized as a three-layer architecture: `applications` (financial tasks such as stock trading, crypto trading, portfolio allocation, high-frequency trading), `agents` (DRL algorithms from ElegantRL, RLlib, and Stable Baselines 3), and `meta` (Gym-style market environments, data processors, and preprocessors) — as documented in [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

The end-to-end pipeline is orchestrated by [finrl/main.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/main.py), which delegates to the three entry points described in [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md): `train.py`, `test.py`, and `trade.py`. Users can also follow the streamlined 2026 tutorial split into three scripts ([examples/FinRL_StockTrading_2026_1_data.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/FinRL_StockTrading_2026_1_data.py), `_2_train.py`, `_3_Backtest.py`).

## Data Pipeline

### Architecture and Data Sources

The data layer is unified by [finrl/meta/data_processor.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/data_processor.py), which exposes a `DataProcessor` wrapper that delegates to provider-specific backends (Yahoo Finance, Alpaca, CCXT, JoinQuant, WRDS, etc.) under [finrl/meta/data_processors/](https://github.com/AI4Finance-Foundation/FinRL/tree/main/finrl/meta/data_processors). The [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md) data-source table lists coverage including AkShare, Alpaca, Baostock, Binance, CCXT, EODhistoricaldata, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Sinopac, Tushare, WRDS, and YahooFinance.

Each processor implements a common contract: `download_data`, `clean_data`, `add_technical_indicator`, `add_vix`, and `df_to_array`. The Yahoo Finance processor in [finrl/meta/data_processors/processor_yahoofinance.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/data_processors/processor_yahoofinance.py) downloads OHLCV history, handles multi-index columns returned by `yfinance`, and adds indicators such as MACD, RSI, CCI, ADX, and the turbulence index.

### Feature Engineering and Splits

The 2026 data script downloads DOW 30 tickers from Yahoo Finance, attaches technical indicators and the VIX/turbulence index, and partitions the data into a training set (2014–2025) and a trading set (2026-01-01 to 2026-03-20) saved as `train_data.csv` and `trade_data.csv` — see [examples/FinRL_StockTrading_2026_1_data.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/examples/FinRL_StockTrading_2026_1_data.py). The default indicator set referenced in [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md) includes `macd`, `boll_ub`, `boll_lb`, `rsi_30`, `dx_30`, `close_30_sma`, and `close_60_sma`, but users can extend this list.

```mermaid
flowchart LR
    A[Data Source<br/>Yahoo / Alpaca / CCXT] --> B[DataProcessor<br/>download_data]
    B --> C[clean_data]
    C --> D[add_technical_indicator<br/>+ VIX + turbulence]
    D --> E[Train / Trade Split<br/>CSV files]
    E --> F[StockTradingEnv]
    F --> G[DRL Agent<br/>A2C / PPO / DDPG / SAC / TD3]
    G --> H[Backtest / Plot]
    G --> I[Alpaca Paper Trading]
```

## End-to-End Application Workflow

### Train / Test / Trade Orchestration

[finrl/main.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/main.py) is the command-line entry point. It accepts `--mode=train`, `--mode=test`, or `--mode=trade` and routes to the matching module. The pipeline is parameterised by [finrl/config.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config.py) (training window, time intervals, technical-indicator list, brokerage parameters, model hyperparameters) and [finrl/config_tickers.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config_tickers.py) (ticker universes such as the DOW 30).

| Step | Module | Responsibility |
|------|--------|----------------|
| 1. Configure | [finrl/config.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config.py) | Define time ranges, indicators, agent hyperparameters |
| 2. Train | [finrl/train.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/train.py) | Build `StockTradingEnv`, call `DRLAgent.train_model`, save checkpoints under `trained_models/` |
| 3. Test | [finrl/test.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/test.py) | Replay a trained policy on the test set and emit account-value / action logs |
| 4. Trade / Backtest | [finrl/trade.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/trade.py), [finrl/plot.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/plot.py) | Compute Sharpe ratio, cumulative return, and compare with MVO / DJIA benchmarks |

The `trade.py` step uses the same `DRLAgent.DRL_prediction` method to generate actions, then calls [finrl/plot.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/plot.py) `backtest_stats` and `backtest_plot` for performance analytics. The 2026 tutorial mirrors this with a dedicated `FinRL_StockTrading_2026_3_Backtest.py` that compares agent returns against Mean-Variance Optimisation and DJIA — see the v0.3.8 release notes referenced in [README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/README.md).

### Applications

The `finrl/applications` directory contains four task families referenced in [finrl/README.md](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/README.md): `stock_trading`, `cryptocurrency_trading`, `portfolio_allocation`, and `high_frequency_trading`. Each application reuses the same three-layer structure and swaps in a domain-specific `Env_*` class from `finrl/meta/`.

## Paper Trading (Alpaca)

### Architecture

[finrl/meta/paper_trading/alpaca.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/meta/paper_trading/alpaca.py) implements an `AlpacaPaperTrading` class that authenticates against the Alpaca paper-trading API, streams intraday bars on a threaded schedule, and dispatches buy/sell orders to a DRL-trained policy checkpoint. The default universe and timestamps are typically derived from [finrl/config_tickers.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config_tickers.py) and the indicators configured in [finrl/config.py](https://github.com/AI4Finance-Foundation/FinRL/blob/main/finrl/config.py).

### Known Threading Pitfalls

The community has surfaced two related defects in the order-submission path:

1. **Thread target called immediately** (issue [#1399](https://github.com/AI4Finance-Foundation/FinRL/issues/1399)): the original code used `Thread(target=self.submitOrder(...))`, which invokes `submitOrder` synchronously and returns `None` as the target. The corrected pattern is `Thread(target=self.submitOrder, args=(...))`.
2. **Unread response lists** (issue [#1414](https://github.com/AI4Finance-Foundation/FinRL/issues/1414)): the `respSO` lists populated by each order thread are joined but never read, so submission errors or fills are silently dropped. Contributors are advised to inspect `respSO` after `join()` to surface failures.

### Market-State Safety

[Issue #1412](https://github.com/AI4Finance-Foundation/FinRL/issues/1412) highlights that `StockTradingEnv` schedules trades via the timestamp alone and does not verify whether the exchange is actually open. This causes silent failures around holidays and DST transitions. A pre-trade market-state check (e.g., via a Headless Oracle–style signed manifest) has been proposed to close this gap.

## Common Failure Modes

Beyond the Alpaca issues above, users frequently hit:

- **`reset(seed=...)` keyword error** ([#1013](https://github.com/AI4Finance-Foundation/FinRL/issues/1013)): `StockTradingEnv` does not accept a `seed` keyword on `reset` when wrapping with newer Gymnasium (`gymnasium>=0.28`). Pin to `gym==0.21` or use `env.reset()` without a seed.
- **Off-policy logging crash** ([#1395](https://github.com/AI4Finance-Foundation/FinRL/issues/1395)): callbacks that read `model.rollout_buffer` blow up for DDPG / TD3 / SAC, which use `replay_buffer` instead. Gate the callback on `hasattr(model, "rollout_buffer")`.
- **Shape mismatches** ([#222](https://github.com/AI4Finance-Foundation/FinRL/issues/222), [#206](https://github.com/AI4Finance-Foundation/FinRL/issues/206)): mismatches between feature count and price array length usually mean the technical-indicator list and the indicator column names in the CSV do not agree.
- **Invalid `Normal` distribution** ([#696](https://github.com/AI4Finance-Foundation/FinRL/issues/696)): NaNs in state arrays — usually from unscaled prices or missing rows in a single-stock environment — produce non-finite `loc` values.
- **Short selling** ([#1255](https://github.com/AI4Finance-Foundation/FinRL/issues/1255)): a long-only restriction is being introduced via an `allow_short_selling` flag on `StockTradingEnv`.

## See Also

- `Data Processors and Technical Indicators` — detailed coverage of `processor_yahoofinance.py`, `processor_alpaca.py`, `processor_ccxt.py`, and indicator definitions.
- `Market Environments` — `StockTradingEnv`, `CryptoTradingEnv`, and `PortfolioOptimizationEnv` contracts.
- `Agents and Training` — DRL algorithm adapters for ElegantRL, RLlib, and Stable Baselines 3.
- `Hyperparameter Tuning` — `finrl/agents/stablebaselines3/tune_sb3.py` Optuna integration.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: AI4Finance-Foundation/FinRL

Summary: Found 12 structured pitfall item(s), including 6 high/blocking item(s). Top priority: Runtime risk - Runtime risk requires verification.

## 1. Runtime risk - Runtime risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1395

## 2. Runtime risk - Runtime risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1414

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1412

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/671

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1013

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/AI4Finance-Foundation/FinRL

## 7. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/AI4Finance-Foundation/FinRL

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/AI4Finance-Foundation/FinRL

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/AI4Finance-Foundation/FinRL

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

<!-- canonical_name: AI4Finance-Foundation/FinRL; human_manual_source: deepwiki_human_wiki -->
