FinRL Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

FinRL

FinRL is the original open-source deep reinforcement learning (DRL) framework for finance, positioned as the classic end-to-end research and educational library within the broader AI4Finan...

FinRL Architecture, Three-Layer Framework & Project Layout

Related topics: Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading ...

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Applications Layer

Continue reading this section for the full explanation and source context.

Section Agents Layer

Continue reading this section for the full explanation and source context.

Section Meta Layer (Market Environments & Data)

Continue reading this section for the full explanation and source context.

FinRL Architecture, Three-Layer Framework & Project Layout

Overview & Ecosystem Position

FinRL is the original open-source deep reinforcement learning (DRL) framework for finance, positioned as the classic end-to-end research and educational library within the broader AI4Finance ecosystem. It is explicitly distinguished from its successor FinRL-X / FinRL-Trading, which is the next-generation AI-native production stack. Source: README.md.

The ecosystem roadmap identifies four generations of libraries: FinRL-Meta (gym-style market environments), FinRL (classic end-to-end train-test-trade pipeline), ElegantRL (lightweight DRL algorithms), and FinRL-X (production-oriented modular infrastructure). Source: README.md.

Three-Layer Architecture

The core design follows a three-layer coupled architecture that separates trading tasks, RL algorithms, and market environments, allowing users to plug in any DRL library and play. Source: finrl/README.md.

flowchart TB
    A["Applications Layer<br/>(cryptocurrency_trading, stock_trading,<br/>portfolio_allocation, high_frequency_trading)"]
    B["Agents Layer<br/>(elegantrl, stablebaseline3, rllib,<br/>portfolio_optimization)"]
    C["Meta Layer<br/>(data_processors, preprocessor,<br/>env_stock_trading, env_cryptocurrency_trading,<br/>env_portfolio_allocation)"]
    A --> B
    B --> C
    C --> A

Applications Layer

This top layer contains the financial tasks and orchestration scripts. The repository currently ships with cryptocurrency_trading, high_frequency_trading, portfolio_allocation, and stock_trading as first-class task folders. The end-to-end train-test-trade pipeline is implemented across train.py, test.py, trade.py, and the entry point main.py. Source: finrl/README.md.

Agents Layer

This layer exposes DRL algorithm integrations through three backends plus a dedicated portfolio-optimization stack:

ElegantRL — exposes AgentDDPG, AgentTD3, AgentSAC, AgentPPO, and AgentA2C through a MODELS dictionary, with OFF_POLICY_MODELS = ["ddpg", "td3", "sac"] and ON_POLICY_MODELS = ["ppo"]. Source: finrl/agents/elegantrl/models.py.
Stable Baselines 3 — used by the FinRL Stock Trading 2026 tutorial to train A2C, DDPG, PPO, SAC, and TD3. Source: examples/README.md.
RLlib — production-grade distributed training backend. Source: finrl/README.md.
Portfolio Optimization agents — implement a custom PolicyGradient algorithm following *Jiang et al*, with optional EIIE convolutional architecture (k_size parameter) and online-learning evaluation mode. Source: finrl/agents/portfolio_optimization/algorithms.py.

Meta Layer (Market Environments & Data)

The bottom layer houses Gymnasium-style market environments, data processors, and preprocessors. Environments are merged from the active FinRL-Meta repository, and include env_stock_trading, env_cryptocurrency_trading, env_portfolio_allocation, and env_portfolio_optimization. Source: finrl/README.md.

The portfolio-optimization environment, for example, expects a dataframe with date and tic columns and an action space shaped (n+1,) representing the cash plus n stock allocation percentages. Source: finrl/meta/env_portfolio_optimization/README.md.

The data-processor layer also supports 14+ external sources including Alpaca, Baostock, CCXT, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Tushare, and Yahoo Finance, with OHLCV plus technical indicators. Source: README.md.

Project File Structure

The top-level layout mirrors the three-layer architecture. Key files include:

Path	Purpose
`finrl/main.py`	CLI entry point for `--mode=train` / `test` / `trade`
`finrl/config.py`	Global configuration: tickers, dates, indicators, hyperparameters
`finrl/config_tickers.py`	Symbol universe definitions (DOW 30, NASDAQ 100, etc.)
`finrl/train.py`	Training pipeline wrapper
`finrl/test.py`	Backtesting / evaluation
`finrl/trade.py`	Live / paper trading orchestration
`finrl/plot.py`	Performance visualization
`finrl/agents/`	DRL algorithm backends
`finrl/meta/`	Environments, preprocessors, data processors
`finrl/applications/`	Stock, crypto, portfolio, HFT task scripts
`examples/`	Standalone tutorials (e.g., Stock Trading 2026)
`unit_tests/`	Environment and downloader tests

Source: finrl/README.md and README.md.

Train-Test-Trade Pipeline & Hyperparameter Tuning

The canonical FinRL workflow, as demonstrated in the v0.3.8 Stock Trading 2026 tutorial, runs in three scripts: data download and preprocessing, DRL agent training (5 algorithms in one pass), and backtesting against baselines such as MVO and DJIA. Source: examples/README.md.

Hyperparameter search is supported via Optuna. The LoggingCallback class in tune_sb3.py tracks the previous_best_value study attribute, prunes trials after a configurable trial_number threshold, and applies a patience window for Sharpe-ratio improvement. Source: finrl/agents/stablebaselines3/tune_sb3.py.

Known Limitations & Community-Reported Issues

Several recurring community issues map directly to architectural seams in this three-layer coupling:

Gymnasium API drift: StockTradingEnv.reset() no longer accepts the legacy seed keyword under Gymnasium 0.28.1, breaking older tutorials. Source: community issue #1013.
Off-policy vs. on-policy callback mismatch: FinRL 0.3.8 logging assumes a rollout_buffer, but DDPG/TD3/SAC use a replay_buffer, causing training failures. Source: community issue #1395.
Threading bug in paper trading: paper_trading/alpaca.py called submitOrder immediately instead of passing it as the thread target argument. Source: community issues #1399 and #1414.
Short-selling controls: A proposed allow_short_selling: bool = True parameter would extend StockTradingEnv action bounds. Source: community issue #1255.

These issues reflect the framework's status as a research prototype: the README itself redirects production users to FinRL-X / FinRL-Trading, which is described as fully decoupled, type-safe (Pydantic), and production-oriented. Source: README.md.

Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading (Alpaca)...

Section Related Pages

Continue reading this section for the full explanation and source context.

Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

Overview and Role in the FinRL Stack

FinRL's market environments form the middle layer of its three-layer architecture (Market Environments, DRL Agents, Financial Applications), providing Gymnasium-compatible simulators where reinforcement learning agents learn sequential trading policies. As described in README.md, these environments expose OHLCV data, technical indicators, and turbulence indices as state observations, then reward agents based on portfolio value changes after executing continuous or discrete actions.

The environment layer is split across multiple sub-modules, each targeting a financial task. The finrl/README.md layout shows that env_stock_trading, env_cryptocurrency_trading, and env_portfolio_allocation live under finrl/meta/, while the applications layer in finrl/applications/Stock_NeurIPS2018 ties a specific environment variant (the original 2018 NeurIPS stock-trading env) to a reproducible training pipeline. Users instantiate an environment, wrap it with a DRL agent from finrl/agents/ (Stable Baselines 3, ElegantRL, or RLlib), then drive it through the canonical train.py → test.py → trade.py pipeline.

The v0.3.8 release (Stock Trading 2026 tutorial) standardizes the workflow on the five DRL agents — A2C, DDPG, PPO, TD3, SAC — and the classic StockTradingEnv family. Each of these algorithms has a slightly different interaction with the environment, and several community-reported bugs stem from this interaction (see Common Failure Modes).

Stock Trading Environments and Variants

The most-used environment is StockTradingEnv in finrl/meta/env_stock_trading/env_stocktrading.py. It accepts a DataFrame containing price, technical-indicator, and turbulence columns plus a list of stock tickers, then exposes a multi-dimensional continuous action space (one action per stock representing target shares or weights). The environment tracks cash, holdings, and portfolio value, computes a reward from the change in portfolio value, and applies transaction-cost penalties.

The repository ships several variants for research experiments:

Variant	File	Purpose
`StockTradingEnv`	env_stocktrading.py	Default equity trading env with turbulence-aware risk control
NumPy port	env_stocktrading_np.py	Lightweight NumPy reimplementation for faster stepping
Cash-penalty variant	env_stocktrading_cashpenalty.py	Penalizes idle cash to encourage full capital deployment
Stop-loss variant	env_stocktrading_stoploss.py	Triggers forced exits when a position drawdown threshold is breached
Paper-trading env	env_stock_papertrading.py	Live Alpaca integration for paper-account execution

The NeurIPS 2018 tutorial (finrl/applications/Stock_NeurIPS2018/README.md) walks through this env end-to-end: a data notebook produces train.csv and trade.csv; the training notebook wraps the env with e_train_gym.get_sb_env() for Stable Baselines 3; the backtest notebook compares the trained agent against Mean-Variance Optimization and the DJIA benchmark.

State construction concatenates the current balance, holdings vector, closing prices, technical indicators (MACD, Bollinger bands, RSI, DX, 30- and 60-day SMAs, per the README.md feature list), and turbulence into a single observation vector. The reset method, however, is sensitive to the Gym/Gymnasium API split, which is the source of one of the most common user errors (issue #1013).

Cryptocurrency and Portfolio Allocation Environments

Beyond equities, FinRL provides environments for two other asset classes.

Cryptocurrency trading lives in finrl/meta/env_cryptocurrency_trading/. It mirrors the stock-trading env but is designed for 24/7 crypto markets with no turbulence gating. CCXT is the canonical data source, as listed in the README.md data-source table, supporting 1-minute OHLCV with exchange-specific request limits.

Portfolio allocation lives in finrl/meta/env_portfolio_allocation/ and follows a different action convention. According to finrl/meta/env_portfolio_optimization/README.md, the environment expects a *portfolio vector* — a 1-D Box of shape (n+1,) where the leading element is the cash weight and the remaining n elements are the allocation weights across n assets. The input DataFrame requires a date column and a tic column, and weights are renormalized to sum to 1 after each step.

A separate portfolio-optimization agent set under finrl/agents/portfolio_optimization/README.md implements architectures such as EI³ (Inception-style CNN for multi-scale temporal features) and the classic Jiang et al. 2017 DPG framework for portfolio management.

flowchart LR
    A[Market Data OHLCV + Indicators] --> B[FinRL Market Environment]
    B --> C[Observation Vector]
    C --> D[DRL Agent SB3 / ElegantRL / RLlib]
    D --> E[Action Continuous / Portfolio Vector]
    E --> B
    B --> F[Reward + Portfolio Value]

Common Failure Modes

Community issues cluster around three recurring failure patterns that any new user of the market environments should be aware of:

Gymnasium API drift (issue #1013). StockTradingEnv.reset() does not accept a seed keyword in its current implementation, so calling env_train.get_sb_env() under Gymnasium 0.28.1 raises unexpected keyword argument 'seed'. Pinning to an older gym version or removing the seed argument at the call site is the standard workaround.

Off-policy / on-policy buffer mismatch (issue #1395). FinRL's training callback logs information from model.rollout_buffer, which is populated by on-policy algorithms (A2C, PPO). Off-policy algorithms such as DDPG, TD3, and SAC use replay_buffer instead, producing an AttributeError at log time. The fix is to guard the logging code on the buffer attribute actually present, or branch on ON_POLICY_MODELS versus OFF_POLICY_MODELS as defined in finrl/agents/elegantrl/models.py.

Short-selling control (issue #1255). There is no first-class allow_short_selling flag in StockTradingEnv today. To prevent negative actions, the suggested approaches are to clip actions to non-negative values or to inflate the sell transaction cost in the env constructor so the agent self-discovers that shorting is unprofitable.

Additional reported issues (#671, #206, #222, #696) trace to environment shape mismatches, pyfolio/zipline import errors, and NaN propagation in the observation when technical indicators are computed on insufficient lookback windows. The v0.3.8 tutorial (examples/README.md) sidesteps most of these by fixing the data window (2014–2025 for training, 2026-01-01 to 2026-03-20 for trading) and using a curated indicator set.

DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, Data Pipeline, Paper Trading (Alpaca) & End-...

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Stable Baselines 3 (finrl/agents/stablebaselines3/models.py)

Continue reading this section for the full explanation and source context.

Section ElegantRL (finrl/agents/elegantrl/models.py)

Continue reading this section for the full explanation and source context.

Section RLlib (finrl/agents/rllib/models.py)

Continue reading this section for the full explanation and source context.

DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

Overview

FinRL ships a pluggable agent layer that lets users train Deep Reinforcement Learning (DRL) trading policies using three external libraries. The framework follows a three-layer architecture: applications (trading tasks), agents (DRL algorithms), and meta (market environments). The agents layer is the abstraction point where the user picks a backend; the same StockTradingEnv or PortfolioOptimizationEnv can be reused across all backends.

Source: finrl/README.md:1-30

The official FinRL Stock Trading 2026 tutorial exercises five DRL algorithms on Dow 30 data — A2C, DDPG, PPO, SAC, TD3 — all backed by Stable Baselines 3 (SB3). Source: examples/README.md:30-50.

Agent Backends

Stable Baselines 3 (`finrl/agents/stablebaselines3/models.py`)

SB3 is the default and most commonly used backend. The DRLAgent class wraps SB3 model classes and exposes get_model, train_model, DRL_prediction, and DRL_prediction_load_from_file.

The supported model registry is:

MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}

Source: finrl/agents/stablebaselines3/models.py:18-19

Default hyperparameters are pulled from finrl/config.py via:

MODEL_KWARGS = {x: config.__dict__[f"{x.upper()}_PARAMS"] for x in MODELS.keys()}

Source: finrl/agents/stablebaselines3/models.py:21

Action noise for off-policy methods (DDPG/TD3/SAC) is provided through NormalActionNoise and OrnsteinUhlenbeckActionNoise. The training loop integrates a TensorboardCallback (subclass of SB3's BaseCallback) for logging. Source: finrl/agents/stablebaselines3/models.py:11-16

ElegantRL (`finrl/agents/elegantrl/models.py`)

ElegantRL provides a lightweight, single-file DRL library. The DRLAgent class here is initialized with env, price_array, tech_array, and turbulence_array, and supports the same five algorithms plus the explicit on-/off-policy classification:

MODELS = {"ddpg": AgentDDPG, "td3": AgentTD3, "sac": AgentSAC,
          "ppo": AgentPPO, "a2c": AgentA2C}
OFF_POLICY_MODELS = ["ddpg", "td3", "sac"]
ON_POLICY_MODELS = ["ppo"]

Source: finrl/agents/elegantrl/models.py:14-21

train_model delegates to ElegantRL's train_agent, while DRL_prediction rebuilds an actor network from act.pth and Config(agent_class=agent_class, env_class=env_class, env_args=env_args). Source: finrl/agents/elegantrl/models.py:54-92

RLlib (`finrl/agents/rllib/models.py`)

RLlib (Ray) is recommended for distributed, multi-agent, or production-scale training. The model registry mirrors the other backends:

MODELS = {"a2c": a2c, "ddpg": ddpg, "td3": td3, "sac": sac, "ppo": ppo}

Source: finrl/agents/rllib/models.py:8

Each algorithm exposes a *Trainer (e.g. PPOTrainer, DDPGTrainer) used inside DRL_prediction. The config is built by copying *_DEFAULT_CONFIG and injecting env_config containing price_array, tech_array, turbulence_array, and if_train. Source: finrl/agents/rllib/models.py:60-90

Portfolio Optimization Agent (`finrl/agents/portfolio_optimization/models.py`)

A separate, dedicated agent for the PortfolioOptimizationEnv. It only ships a PolicyGradient ("pg") algorithm and uses an EIIE convolutional policy architecture. The example in the module README configures model_kwargs={"lr": 0.01, "policy": EIIE} and trains for episodes instead of timesteps. Source: finrl/agents/portfolio_optimization/README.md:10-40

Training Workflow

flowchart LR
    A[Market Data CSV] --> B[Data Processor / Indicators]
    B --> C[Train/Test Split]
    C --> D[StockTradingEnv or PortfolioOptimizationEnv]
    D --> E{Pick Agent Backend}
    E -->|SB3| F1[stable_baselines3]
    E -->|ElegantRL| F2[elegantrl.agents]
    E -->|RLlib| F3[ray.rllib]
    F1 --> G[Trained Model .zip / .pth]
    F2 --> G
    F3 --> G
    G --> H[DRL_prediction]
    H --> I[Backtest vs DJIA / MVO]

The split between train and trade is performed by data_split from finrl.meta.preprocessor.preprocessors, which the SB3 TensorboardCallback references. Source: finrl/agents/stablebaselines3/models.py:13

Common Training Errors

The community issue tracker surfaces several recurring failure modes that map directly to the agent layer.

1. `StockTradingEnv.reset() got an unexpected keyword argument 'seed'`

Reported on Google Colab with Gymnasium 0.28.1 in the Stock_NeurIPS2018_SB3.ipynb tutorial. The mismatch comes from Gymnasium's stricter reset(seed=...) signature, which env_stocktrading.py was not yet passing through. The fix is to either pin gymnasium<0.26 or ensure the env's reset accepts a seed kwarg. Source: github.com/AI4Finance-Foundation/FinRL/issues/1013

2. `rollout_buffer` logging error for off-policy algorithms (DDPG/TD3/SAC)

In FinRL 0.3.8, the TensorboardCallback (in finrl/agents/stablebaselines3/models.py) records metrics that assume an on-policy rollout_buffer exists. Off-policy algorithms instead expose a replay_buffer, causing AttributeError on the first step. Workarounds include guarding the log with hasattr(self.model, "rollout_buffer") or registering a separate callback. Source: github.com/AI4Finance-Foundation/FinRL/issues/1395

3. `Normal(loc, scale) invalid values` for single-stock training (issue #696)

A ValueError: Expected parameter loc ... to satisfy the constraint Real() is raised when the actor network outputs non-finite log-std on a tiny state space (e.g. one stock). The fix is gradient clipping, lower learning rate, or a bounded action policy. Source: github.com/AI4Finance-Foundation/FinRL/issues/696

4. Shape mismatch in `main.py --mode=train` (issues #206, #222)

cannot copy sequence with size 292 to array axis with dimension 301 and similar Could not broadcast input array from shape errors are almost always caused by df and tech_array having different lengths after indicator preprocessing. Verify that data_split is called after indicators are added and that the turbulence index is aligned to df.index. Source: github.com/AI4Finance-Foundation/FinRL/issues/206, github.com/AI4Finance-Foundation/FinRL/issues/222

5. `AttributeError` from `main.py` (issue #671)

A SystemExit(main()) crash on line 152 of finrl/main.py typically means finrl.config is missing required tickers (e.g. DOW_30_TICKER) or the data downloader returned an empty DataFrame. Source: github.com/AI4Finance-Foundation/FinRL/issues/671

Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, R...

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Architecture and Data Sources

Continue reading this section for the full explanation and source context.

Section Feature Engineering and Splits

Continue reading this section for the full explanation and source context.

Section Train / Test / Trade Orchestration

Continue reading this section for the full explanation and source context.

Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

Overview

FinRL provides a complete train–test–trade workflow that ties together market data ingestion, feature engineering, DRL agent training, backtesting, and (optionally) live paper trading through the Alpaca brokerage. The framework is organized as a three-layer architecture: applications (financial tasks such as stock trading, crypto trading, portfolio allocation, high-frequency trading), agents (DRL algorithms from ElegantRL, RLlib, and Stable Baselines 3), and meta (Gym-style market environments, data processors, and preprocessors) — as documented in README.md.

The end-to-end pipeline is orchestrated by finrl/main.py, which delegates to the three entry points described in finrl/README.md: train.py, test.py, and trade.py. Users can also follow the streamlined 2026 tutorial split into three scripts (examples/FinRL_StockTrading_2026_1_data.py, _2_train.py, _3_Backtest.py).

Data Pipeline

Architecture and Data Sources

The data layer is unified by finrl/meta/data_processor.py, which exposes a DataProcessor wrapper that delegates to provider-specific backends (Yahoo Finance, Alpaca, CCXT, JoinQuant, WRDS, etc.) under finrl/meta/data_processors/. The README.md data-source table lists coverage including AkShare, Alpaca, Baostock, Binance, CCXT, EODhistoricaldata, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Sinopac, Tushare, WRDS, and YahooFinance.

Each processor implements a common contract: download_data, clean_data, add_technical_indicator, add_vix, and df_to_array. The Yahoo Finance processor in finrl/meta/data_processors/processor_yahoofinance.py downloads OHLCV history, handles multi-index columns returned by yfinance, and adds indicators such as MACD, RSI, CCI, ADX, and the turbulence index.

Feature Engineering and Splits

The 2026 data script downloads DOW 30 tickers from Yahoo Finance, attaches technical indicators and the VIX/turbulence index, and partitions the data into a training set (2014–2025) and a trading set (2026-01-01 to 2026-03-20) saved as train_data.csv and trade_data.csv — see examples/FinRL_StockTrading_2026_1_data.py. The default indicator set referenced in README.md includes macd, boll_ub, boll_lb, rsi_30, dx_30, close_30_sma, and close_60_sma, but users can extend this list.

flowchart LR
    A[Data Source<br/>Yahoo / Alpaca / CCXT] --> B[DataProcessor<br/>download_data]
    B --> C[clean_data]
    C --> D[add_technical_indicator<br/>+ VIX + turbulence]
    D --> E[Train / Trade Split<br/>CSV files]
    E --> F[StockTradingEnv]
    F --> G[DRL Agent<br/>A2C / PPO / DDPG / SAC / TD3]
    G --> H[Backtest / Plot]
    G --> I[Alpaca Paper Trading]

End-to-End Application Workflow

Train / Test / Trade Orchestration

finrl/main.py is the command-line entry point. It accepts --mode=train, --mode=test, or --mode=trade and routes to the matching module. The pipeline is parameterised by finrl/config.py (training window, time intervals, technical-indicator list, brokerage parameters, model hyperparameters) and finrl/config_tickers.py (ticker universes such as the DOW 30).

Step	Module	Responsibility
1. Configure	finrl/config.py	Define time ranges, indicators, agent hyperparameters
2. Train	finrl/train.py	Build `StockTradingEnv`, call `DRLAgent.train_model`, save checkpoints under `trained_models/`
3. Test	finrl/test.py	Replay a trained policy on the test set and emit account-value / action logs
4. Trade / Backtest	finrl/trade.py, finrl/plot.py	Compute Sharpe ratio, cumulative return, and compare with MVO / DJIA benchmarks

The trade.py step uses the same DRLAgent.DRL_prediction method to generate actions, then calls finrl/plot.py backtest_stats and backtest_plot for performance analytics. The 2026 tutorial mirrors this with a dedicated FinRL_StockTrading_2026_3_Backtest.py that compares agent returns against Mean-Variance Optimisation and DJIA — see the v0.3.8 release notes referenced in README.md.

Applications

The finrl/applications directory contains four task families referenced in finrl/README.md: stock_trading, cryptocurrency_trading, portfolio_allocation, and high_frequency_trading. Each application reuses the same three-layer structure and swaps in a domain-specific Env_* class from finrl/meta/.

Paper Trading (Alpaca)

Architecture

finrl/meta/paper_trading/alpaca.py implements an AlpacaPaperTrading class that authenticates against the Alpaca paper-trading API, streams intraday bars on a threaded schedule, and dispatches buy/sell orders to a DRL-trained policy checkpoint. The default universe and timestamps are typically derived from finrl/config_tickers.py and the indicators configured in finrl/config.py.

Known Threading Pitfalls

The community has surfaced two related defects in the order-submission path:

Thread target called immediately (issue #1399): the original code used Thread(target=self.submitOrder(...)), which invokes submitOrder synchronously and returns None as the target. The corrected pattern is Thread(target=self.submitOrder, args=(...)).
Unread response lists (issue #1414): the respSO lists populated by each order thread are joined but never read, so submission errors or fills are silently dropped. Contributors are advised to inspect respSO after join() to surface failures.

Market-State Safety

Issue #1412 highlights that StockTradingEnv schedules trades via the timestamp alone and does not verify whether the exchange is actually open. This causes silent failures around holidays and DST transitions. A pre-trade market-state check (e.g., via a Headless Oracle–style signed manifest) has been proposed to close this gap.

Common Failure Modes

Beyond the Alpaca issues above, users frequently hit:

reset(seed=...) keyword error (#1013): StockTradingEnv does not accept a seed keyword on reset when wrapping with newer Gymnasium (gymnasium>=0.28). Pin to gym==0.21 or use env.reset() without a seed.
Off-policy logging crash (#1395): callbacks that read model.rollout_buffer blow up for DDPG / TD3 / SAC, which use replay_buffer instead. Gate the callback on hasattr(model, "rollout_buffer").
Shape mismatches (#222, #206): mismatches between feature count and price array length usually mean the technical-indicator list and the indicator column names in the CSV do not agree.
Invalid Normal distribution (#696): NaNs in state arrays — usually from unscaled prices or missing rows in a single-stock environment — produce non-finite loc values.
Short selling (#1255): a long-only restriction is being introduced via an allow_short_selling flag on StockTradingEnv.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Runtime risk requires verification

May increase setup, validation, or first-run risk for the user.

high Runtime risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 12 structured pitfall item(s), including 6 high/blocking item(s). Top priority: Runtime risk - Runtime risk requires verification.

1. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1395

2. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1414

3. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1412

4. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/671

5. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1013

6. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: packet_text.keyword_scan | https://github.com/AI4Finance-Foundation/FinRL

7. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/AI4Finance-Foundation/FinRL

8. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

9. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/AI4Finance-Foundation/FinRL

10. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/AI4Finance-Foundation/FinRL

11. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

12. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using FinRL with real data or production workflows.

Google colab Stock_NeurIPS2018_SB3.ipynb StockTradingEnv.reset() got an - github / github_issue
DDPG / off-policy algorithms fail due to rollout_buffer logging in FinRL - github / github_issue
Question About Open-Sourcing the FinRL-DT Implementation - github / github_issue
Feature: Chart pattern similarity as observation/state for RL agents - github / github_issue
paper_trading/alpaca.py: submitOrder response list never read after thre - github / github_issue
Add pre-trade market state verification to StockTradingEnv - github / github_issue
Fix thread target invocation in paper_trading/alpaca.py (submitOrder cal - github / github_issue
Is there a way to prevent the FinRL model from doing any Short selling - github / github_issue
AttributeError when running "python main.py --mode=train" command - github / github_issue
v0.3.8 - github / github_release
Security or permission risk requires verification - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

FinRL

FinRL Architecture, Three-Layer Framework & Project Layout

Related Pages

FinRL Architecture, Three-Layer Framework & Project Layout

Overview & Ecosystem Position

Three-Layer Architecture

Applications Layer

Agents Layer

Meta Layer (Market Environments & Data)

Project File Structure

Train-Test-Trade Pipeline & Hyperparameter Tuning

Known Limitations & Community-Reported Issues

See Also

Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

Related Pages

Market Environments: StockTradingEnv, Crypto, Portfolio & Variants

Overview and Role in the FinRL Stack

Stock Trading Environments and Variants

Cryptocurrency and Portfolio Allocation Environments

Common Failure Modes

See Also

DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

Related Pages

DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors

Overview

Agent Backends

Stable Baselines 3 (`finrl/agents/stablebaselines3/models.py`)

ElegantRL (`finrl/agents/elegantrl/models.py`)

RLlib (`finrl/agents/rllib/models.py`)

Portfolio Optimization Agent (`finrl/agents/portfolio_optimization/models.py`)

Training Workflow

Common Training Errors

1. `StockTradingEnv.reset() got an unexpected keyword argument 'seed'`

2. `rollout_buffer` logging error for off-policy algorithms (DDPG/TD3/SAC)

3. `Normal(loc, scale) invalid values` for single-stock training (issue #696)

4. Shape mismatch in `main.py --mode=train` (issues #206, #222)

5. `AttributeError` from `main.py` (issue #671)

See Also

Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

Related Pages

Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications

Overview

Data Pipeline

Architecture and Data Sources

Feature Engineering and Splits

End-to-End Application Workflow

Train / Test / Trade Orchestration

Applications

Paper Trading (Alpaca)

Architecture

Known Threading Pitfalls

Market-State Safety

Common Failure Modes

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Runtime risk: Runtime risk requires verification

2. Runtime risk: Runtime risk requires verification

3. Security or permission risk: Security or permission risk requires verification

4. Security or permission risk: Security or permission risk requires verification

5. Security or permission risk: Security or permission risk requires verification

6. Security or permission risk: Security or permission risk requires verification

7. Capability evidence risk: Capability evidence risk requires verification

8. Maintenance risk: Maintenance risk requires verification

9. Security or permission risk: Security or permission risk requires verification

10. Security or permission risk: Security or permission risk requires verification

11. Maintenance risk: Maintenance risk requires verification

12. Maintenance risk: Maintenance risk requires verification

Community Discussion Evidence

Community Discussion Evidence