Doramagic Project Pack · Human Manual
FinRL
FinRL is the original open-source deep reinforcement learning (DRL) framework for finance, positioned as the classic end-to-end research and educational library within the broader AI4Finan...
FinRL Architecture, Three-Layer Framework & Project Layout
Related topics: Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading ...
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications
FinRL Architecture, Three-Layer Framework & Project Layout
Overview & Ecosystem Position
FinRL is the original open-source deep reinforcement learning (DRL) framework for finance, positioned as the classic end-to-end research and educational library within the broader AI4Finance ecosystem. It is explicitly distinguished from its successor FinRL-X / FinRL-Trading, which is the next-generation AI-native production stack. Source: README.md.
The ecosystem roadmap identifies four generations of libraries: FinRL-Meta (gym-style market environments), FinRL (classic end-to-end train-test-trade pipeline), ElegantRL (lightweight DRL algorithms), and FinRL-X (production-oriented modular infrastructure). Source: README.md.
Three-Layer Architecture
The core design follows a three-layer coupled architecture that separates trading tasks, RL algorithms, and market environments, allowing users to plug in any DRL library and play. Source: finrl/README.md.
flowchart TB
A["Applications Layer<br/>(cryptocurrency_trading, stock_trading,<br/>portfolio_allocation, high_frequency_trading)"]
B["Agents Layer<br/>(elegantrl, stablebaseline3, rllib,<br/>portfolio_optimization)"]
C["Meta Layer<br/>(data_processors, preprocessor,<br/>env_stock_trading, env_cryptocurrency_trading,<br/>env_portfolio_allocation)"]
A --> B
B --> C
C --> AApplications Layer
This top layer contains the financial tasks and orchestration scripts. The repository currently ships with cryptocurrency_trading, high_frequency_trading, portfolio_allocation, and stock_trading as first-class task folders. The end-to-end train-test-trade pipeline is implemented across train.py, test.py, trade.py, and the entry point main.py. Source: finrl/README.md.
Agents Layer
This layer exposes DRL algorithm integrations through three backends plus a dedicated portfolio-optimization stack:
- ElegantRL — exposes
AgentDDPG,AgentTD3,AgentSAC,AgentPPO, andAgentA2Cthrough aMODELSdictionary, withOFF_POLICY_MODELS = ["ddpg", "td3", "sac"]andON_POLICY_MODELS = ["ppo"]. Source: finrl/agents/elegantrl/models.py. - Stable Baselines 3 — used by the FinRL Stock Trading 2026 tutorial to train A2C, DDPG, PPO, SAC, and TD3. Source: examples/README.md.
- RLlib — production-grade distributed training backend. Source: finrl/README.md.
- Portfolio Optimization agents — implement a custom
PolicyGradientalgorithm following *Jiang et al*, with optional EIIE convolutional architecture (k_sizeparameter) and online-learning evaluation mode. Source: finrl/agents/portfolio_optimization/algorithms.py.
Meta Layer (Market Environments & Data)
The bottom layer houses Gymnasium-style market environments, data processors, and preprocessors. Environments are merged from the active FinRL-Meta repository, and include env_stock_trading, env_cryptocurrency_trading, env_portfolio_allocation, and env_portfolio_optimization. Source: finrl/README.md.
The portfolio-optimization environment, for example, expects a dataframe with date and tic columns and an action space shaped (n+1,) representing the cash plus n stock allocation percentages. Source: finrl/meta/env_portfolio_optimization/README.md.
The data-processor layer also supports 14+ external sources including Alpaca, Baostock, CCXT, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Tushare, and Yahoo Finance, with OHLCV plus technical indicators. Source: README.md.
Project File Structure
The top-level layout mirrors the three-layer architecture. Key files include:
| Path | Purpose |
|---|---|
finrl/main.py | CLI entry point for --mode=train / test / trade |
finrl/config.py | Global configuration: tickers, dates, indicators, hyperparameters |
finrl/config_tickers.py | Symbol universe definitions (DOW 30, NASDAQ 100, etc.) |
finrl/train.py | Training pipeline wrapper |
finrl/test.py | Backtesting / evaluation |
finrl/trade.py | Live / paper trading orchestration |
finrl/plot.py | Performance visualization |
finrl/agents/ | DRL algorithm backends |
finrl/meta/ | Environments, preprocessors, data processors |
finrl/applications/ | Stock, crypto, portfolio, HFT task scripts |
examples/ | Standalone tutorials (e.g., Stock Trading 2026) |
unit_tests/ | Environment and downloader tests |
Source: finrl/README.md and README.md.
Train-Test-Trade Pipeline & Hyperparameter Tuning
The canonical FinRL workflow, as demonstrated in the v0.3.8 Stock Trading 2026 tutorial, runs in three scripts: data download and preprocessing, DRL agent training (5 algorithms in one pass), and backtesting against baselines such as MVO and DJIA. Source: examples/README.md.
Hyperparameter search is supported via Optuna. The LoggingCallback class in tune_sb3.py tracks the previous_best_value study attribute, prunes trials after a configurable trial_number threshold, and applies a patience window for Sharpe-ratio improvement. Source: finrl/agents/stablebaselines3/tune_sb3.py.
Known Limitations & Community-Reported Issues
Several recurring community issues map directly to architectural seams in this three-layer coupling:
- Gymnasium API drift:
StockTradingEnv.reset()no longer accepts the legacyseedkeyword under Gymnasium 0.28.1, breaking older tutorials. Source: community issue #1013. - Off-policy vs. on-policy callback mismatch: FinRL 0.3.8 logging assumes a
rollout_buffer, but DDPG/TD3/SAC use areplay_buffer, causing training failures. Source: community issue #1395. - Threading bug in paper trading:
paper_trading/alpaca.pycalledsubmitOrderimmediately instead of passing it as the threadtargetargument. Source: community issues #1399 and #1414. - Short-selling controls: A proposed
allow_short_selling: bool = Trueparameter would extendStockTradingEnvaction bounds. Source: community issue #1255.
These issues reflect the framework's status as a research prototype: the README itself redirects production users to FinRL-X / FinRL-Trading, which is described as fully decoupled, type-safe (Pydantic), and production-oriented. Source: README.md.
See Also
Source: https://github.com/AI4Finance-Foundation/FinRL / Human Manual
Market Environments: StockTradingEnv, Crypto, Portfolio & Variants
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading (Alpaca)...
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors, Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications
Market Environments: StockTradingEnv, Crypto, Portfolio & Variants
Overview and Role in the FinRL Stack
FinRL's market environments form the middle layer of its three-layer architecture (Market Environments, DRL Agents, Financial Applications), providing Gymnasium-compatible simulators where reinforcement learning agents learn sequential trading policies. As described in README.md, these environments expose OHLCV data, technical indicators, and turbulence indices as state observations, then reward agents based on portfolio value changes after executing continuous or discrete actions.
The environment layer is split across multiple sub-modules, each targeting a financial task. The finrl/README.md layout shows that env_stock_trading, env_cryptocurrency_trading, and env_portfolio_allocation live under finrl/meta/, while the applications layer in finrl/applications/Stock_NeurIPS2018 ties a specific environment variant (the original 2018 NeurIPS stock-trading env) to a reproducible training pipeline. Users instantiate an environment, wrap it with a DRL agent from finrl/agents/ (Stable Baselines 3, ElegantRL, or RLlib), then drive it through the canonical train.py → test.py → trade.py pipeline.
The v0.3.8 release (Stock Trading 2026 tutorial) standardizes the workflow on the five DRL agents — A2C, DDPG, PPO, TD3, SAC — and the classic StockTradingEnv family. Each of these algorithms has a slightly different interaction with the environment, and several community-reported bugs stem from this interaction (see Common Failure Modes).
Stock Trading Environments and Variants
The most-used environment is StockTradingEnv in finrl/meta/env_stock_trading/env_stocktrading.py. It accepts a DataFrame containing price, technical-indicator, and turbulence columns plus a list of stock tickers, then exposes a multi-dimensional continuous action space (one action per stock representing target shares or weights). The environment tracks cash, holdings, and portfolio value, computes a reward from the change in portfolio value, and applies transaction-cost penalties.
The repository ships several variants for research experiments:
| Variant | File | Purpose |
|---|---|---|
StockTradingEnv | env_stocktrading.py | Default equity trading env with turbulence-aware risk control |
| NumPy port | env_stocktrading_np.py | Lightweight NumPy reimplementation for faster stepping |
| Cash-penalty variant | env_stocktrading_cashpenalty.py | Penalizes idle cash to encourage full capital deployment |
| Stop-loss variant | env_stocktrading_stoploss.py | Triggers forced exits when a position drawdown threshold is breached |
| Paper-trading env | env_stock_papertrading.py | Live Alpaca integration for paper-account execution |
The NeurIPS 2018 tutorial (finrl/applications/Stock_NeurIPS2018/README.md) walks through this env end-to-end: a data notebook produces train.csv and trade.csv; the training notebook wraps the env with e_train_gym.get_sb_env() for Stable Baselines 3; the backtest notebook compares the trained agent against Mean-Variance Optimization and the DJIA benchmark.
State construction concatenates the current balance, holdings vector, closing prices, technical indicators (MACD, Bollinger bands, RSI, DX, 30- and 60-day SMAs, per the README.md feature list), and turbulence into a single observation vector. The reset method, however, is sensitive to the Gym/Gymnasium API split, which is the source of one of the most common user errors (issue #1013).
Cryptocurrency and Portfolio Allocation Environments
Beyond equities, FinRL provides environments for two other asset classes.
Cryptocurrency trading lives in finrl/meta/env_cryptocurrency_trading/. It mirrors the stock-trading env but is designed for 24/7 crypto markets with no turbulence gating. CCXT is the canonical data source, as listed in the README.md data-source table, supporting 1-minute OHLCV with exchange-specific request limits.
Portfolio allocation lives in finrl/meta/env_portfolio_allocation/ and follows a different action convention. According to finrl/meta/env_portfolio_optimization/README.md, the environment expects a *portfolio vector* — a 1-D Box of shape (n+1,) where the leading element is the cash weight and the remaining n elements are the allocation weights across n assets. The input DataFrame requires a date column and a tic column, and weights are renormalized to sum to 1 after each step.
A separate portfolio-optimization agent set under finrl/agents/portfolio_optimization/README.md implements architectures such as EI³ (Inception-style CNN for multi-scale temporal features) and the classic Jiang et al. 2017 DPG framework for portfolio management.
flowchart LR
A[Market Data OHLCV + Indicators] --> B[FinRL Market Environment]
B --> C[Observation Vector]
C --> D[DRL Agent SB3 / ElegantRL / RLlib]
D --> E[Action Continuous / Portfolio Vector]
E --> B
B --> F[Reward + Portfolio Value]Common Failure Modes
Community issues cluster around three recurring failure patterns that any new user of the market environments should be aware of:
- Gymnasium API drift (issue #1013).
StockTradingEnv.reset()does not accept aseedkeyword in its current implementation, so callingenv_train.get_sb_env()under Gymnasium 0.28.1 raisesunexpected keyword argument 'seed'. Pinning to an oldergymversion or removing theseedargument at the call site is the standard workaround.
- Off-policy / on-policy buffer mismatch (issue #1395). FinRL's training callback logs information from
model.rollout_buffer, which is populated by on-policy algorithms (A2C, PPO). Off-policy algorithms such as DDPG, TD3, and SAC usereplay_bufferinstead, producing anAttributeErrorat log time. The fix is to guard the logging code on the buffer attribute actually present, or branch onON_POLICY_MODELSversusOFF_POLICY_MODELSas defined in finrl/agents/elegantrl/models.py.
- Short-selling control (issue #1255). There is no first-class
allow_short_sellingflag inStockTradingEnvtoday. To prevent negative actions, the suggested approaches are to clip actions to non-negative values or to inflate the sell transaction cost in the env constructor so the agent self-discovers that shorting is unprofitable.
Additional reported issues (#671, #206, #222, #696) trace to environment shape mismatches, pyfolio/zipline import errors, and NaN propagation in the observation when technical indicators are computed on insufficient lookback windows. The v0.3.8 tutorial (examples/README.md) sidesteps most of these by fixing the data window (2014–2025 for training, 2026-01-01 to 2026-03-20 for trading) and using a curated indicator set.
See Also
- FinRL Repository Overview
- FinRL-Meta: Market Environments and Benchmarks
- FinRL-X / FinRL-Trading (next-generation stack)
- Stock Trading 2026 tutorial scripts under
examples/ - Stable Baselines 3 integration via
get_sb_env()helper in agent modules
Source: https://github.com/AI4Finance-Foundation/FinRL / Human Manual
DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, Data Pipeline, Paper Trading (Alpaca) & End-...
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications
DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors
Overview
FinRL ships a pluggable agent layer that lets users train Deep Reinforcement Learning (DRL) trading policies using three external libraries. The framework follows a three-layer architecture: applications (trading tasks), agents (DRL algorithms), and meta (market environments). The agents layer is the abstraction point where the user picks a backend; the same StockTradingEnv or PortfolioOptimizationEnv can be reused across all backends.
Source: finrl/README.md:1-30
The official FinRL Stock Trading 2026 tutorial exercises five DRL algorithms on Dow 30 data — A2C, DDPG, PPO, SAC, TD3 — all backed by Stable Baselines 3 (SB3). Source: examples/README.md:30-50.
Agent Backends
Stable Baselines 3 (`finrl/agents/stablebaselines3/models.py`)
SB3 is the default and most commonly used backend. The DRLAgent class wraps SB3 model classes and exposes get_model, train_model, DRL_prediction, and DRL_prediction_load_from_file.
The supported model registry is:
MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}
Source: finrl/agents/stablebaselines3/models.py:18-19
Default hyperparameters are pulled from finrl/config.py via:
MODEL_KWARGS = {x: config.__dict__[f"{x.upper()}_PARAMS"] for x in MODELS.keys()}
Source: finrl/agents/stablebaselines3/models.py:21
Action noise for off-policy methods (DDPG/TD3/SAC) is provided through NormalActionNoise and OrnsteinUhlenbeckActionNoise. The training loop integrates a TensorboardCallback (subclass of SB3's BaseCallback) for logging. Source: finrl/agents/stablebaselines3/models.py:11-16
ElegantRL (`finrl/agents/elegantrl/models.py`)
ElegantRL provides a lightweight, single-file DRL library. The DRLAgent class here is initialized with env, price_array, tech_array, and turbulence_array, and supports the same five algorithms plus the explicit on-/off-policy classification:
MODELS = {"ddpg": AgentDDPG, "td3": AgentTD3, "sac": AgentSAC,
"ppo": AgentPPO, "a2c": AgentA2C}
OFF_POLICY_MODELS = ["ddpg", "td3", "sac"]
ON_POLICY_MODELS = ["ppo"]
Source: finrl/agents/elegantrl/models.py:14-21
train_model delegates to ElegantRL's train_agent, while DRL_prediction rebuilds an actor network from act.pth and Config(agent_class=agent_class, env_class=env_class, env_args=env_args). Source: finrl/agents/elegantrl/models.py:54-92
RLlib (`finrl/agents/rllib/models.py`)
RLlib (Ray) is recommended for distributed, multi-agent, or production-scale training. The model registry mirrors the other backends:
MODELS = {"a2c": a2c, "ddpg": ddpg, "td3": td3, "sac": sac, "ppo": ppo}
Source: finrl/agents/rllib/models.py:8
Each algorithm exposes a *Trainer (e.g. PPOTrainer, DDPGTrainer) used inside DRL_prediction. The config is built by copying *_DEFAULT_CONFIG and injecting env_config containing price_array, tech_array, turbulence_array, and if_train. Source: finrl/agents/rllib/models.py:60-90
Portfolio Optimization Agent (`finrl/agents/portfolio_optimization/models.py`)
A separate, dedicated agent for the PortfolioOptimizationEnv. It only ships a PolicyGradient ("pg") algorithm and uses an EIIE convolutional policy architecture. The example in the module README configures model_kwargs={"lr": 0.01, "policy": EIIE} and trains for episodes instead of timesteps. Source: finrl/agents/portfolio_optimization/README.md:10-40
Training Workflow
flowchart LR
A[Market Data CSV] --> B[Data Processor / Indicators]
B --> C[Train/Test Split]
C --> D[StockTradingEnv or PortfolioOptimizationEnv]
D --> E{Pick Agent Backend}
E -->|SB3| F1[stable_baselines3]
E -->|ElegantRL| F2[elegantrl.agents]
E -->|RLlib| F3[ray.rllib]
F1 --> G[Trained Model .zip / .pth]
F2 --> G
F3 --> G
G --> H[DRL_prediction]
H --> I[Backtest vs DJIA / MVO]The split between train and trade is performed by data_split from finrl.meta.preprocessor.preprocessors, which the SB3 TensorboardCallback references. Source: finrl/agents/stablebaselines3/models.py:13
Common Training Errors
The community issue tracker surfaces several recurring failure modes that map directly to the agent layer.
1. `StockTradingEnv.reset() got an unexpected keyword argument 'seed'`
Reported on Google Colab with Gymnasium 0.28.1 in the Stock_NeurIPS2018_SB3.ipynb tutorial. The mismatch comes from Gymnasium's stricter reset(seed=...) signature, which env_stocktrading.py was not yet passing through. The fix is to either pin gymnasium<0.26 or ensure the env's reset accepts a seed kwarg. Source: github.com/AI4Finance-Foundation/FinRL/issues/1013
2. `rollout_buffer` logging error for off-policy algorithms (DDPG/TD3/SAC)
In FinRL 0.3.8, the TensorboardCallback (in finrl/agents/stablebaselines3/models.py) records metrics that assume an on-policy rollout_buffer exists. Off-policy algorithms instead expose a replay_buffer, causing AttributeError on the first step. Workarounds include guarding the log with hasattr(self.model, "rollout_buffer") or registering a separate callback. Source: github.com/AI4Finance-Foundation/FinRL/issues/1395
3. `Normal(loc, scale) invalid values` for single-stock training (issue #696)
A ValueError: Expected parameter loc ... to satisfy the constraint Real() is raised when the actor network outputs non-finite log-std on a tiny state space (e.g. one stock). The fix is gradient clipping, lower learning rate, or a bounded action policy. Source: github.com/AI4Finance-Foundation/FinRL/issues/696
4. Shape mismatch in `main.py --mode=train` (issues #206, #222)
cannot copy sequence with size 292 to array axis with dimension 301 and similar Could not broadcast input array from shape errors are almost always caused by df and tech_array having different lengths after indicator preprocessing. Verify that data_split is called after indicators are added and that the turbulence index is aligned to df.index. Source: github.com/AI4Finance-Foundation/FinRL/issues/206, github.com/AI4Finance-Foundation/FinRL/issues/222
5. `AttributeError` from `main.py` (issue #671)
A SystemExit(main()) crash on line 152 of finrl/main.py typically means finrl.config is missing required tickers (e.g. DOW_30_TICKER) or the data downloader returned an empty DataFrame. Source: github.com/AI4Finance-Foundation/FinRL/issues/671
See Also
- Market Environments (
StockTradingEnv,PortfolioOptimizationEnv) - Data Processors and Technical Indicators
- Paper Trading via Alpaca
- FinRL-X / FinRL-Trading (next-generation stack)
Source: https://github.com/AI4Finance-Foundation/FinRL / Human Manual
Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, R...
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: FinRL Architecture, Three-Layer Framework & Project Layout, Market Environments: StockTradingEnv, Crypto, Portfolio & Variants, DRL Agents: Stable Baselines 3, ElegantRL, RLlib & Common Training Errors
Data Pipeline, Paper Trading (Alpaca) & End-to-End Applications
Overview
FinRL provides a complete train–test–trade workflow that ties together market data ingestion, feature engineering, DRL agent training, backtesting, and (optionally) live paper trading through the Alpaca brokerage. The framework is organized as a three-layer architecture: applications (financial tasks such as stock trading, crypto trading, portfolio allocation, high-frequency trading), agents (DRL algorithms from ElegantRL, RLlib, and Stable Baselines 3), and meta (Gym-style market environments, data processors, and preprocessors) — as documented in README.md.
The end-to-end pipeline is orchestrated by finrl/main.py, which delegates to the three entry points described in finrl/README.md: train.py, test.py, and trade.py. Users can also follow the streamlined 2026 tutorial split into three scripts (examples/FinRL_StockTrading_2026_1_data.py, _2_train.py, _3_Backtest.py).
Data Pipeline
Architecture and Data Sources
The data layer is unified by finrl/meta/data_processor.py, which exposes a DataProcessor wrapper that delegates to provider-specific backends (Yahoo Finance, Alpaca, CCXT, JoinQuant, WRDS, etc.) under finrl/meta/data_processors/. The README.md data-source table lists coverage including AkShare, Alpaca, Baostock, Binance, CCXT, EODhistoricaldata, IEXCloud, JoinQuant, QuantConnect, RiceQuant, Sinopac, Tushare, WRDS, and YahooFinance.
Each processor implements a common contract: download_data, clean_data, add_technical_indicator, add_vix, and df_to_array. The Yahoo Finance processor in finrl/meta/data_processors/processor_yahoofinance.py downloads OHLCV history, handles multi-index columns returned by yfinance, and adds indicators such as MACD, RSI, CCI, ADX, and the turbulence index.
Feature Engineering and Splits
The 2026 data script downloads DOW 30 tickers from Yahoo Finance, attaches technical indicators and the VIX/turbulence index, and partitions the data into a training set (2014–2025) and a trading set (2026-01-01 to 2026-03-20) saved as train_data.csv and trade_data.csv — see examples/FinRL_StockTrading_2026_1_data.py. The default indicator set referenced in README.md includes macd, boll_ub, boll_lb, rsi_30, dx_30, close_30_sma, and close_60_sma, but users can extend this list.
flowchart LR
A[Data Source<br/>Yahoo / Alpaca / CCXT] --> B[DataProcessor<br/>download_data]
B --> C[clean_data]
C --> D[add_technical_indicator<br/>+ VIX + turbulence]
D --> E[Train / Trade Split<br/>CSV files]
E --> F[StockTradingEnv]
F --> G[DRL Agent<br/>A2C / PPO / DDPG / SAC / TD3]
G --> H[Backtest / Plot]
G --> I[Alpaca Paper Trading]End-to-End Application Workflow
Train / Test / Trade Orchestration
finrl/main.py is the command-line entry point. It accepts --mode=train, --mode=test, or --mode=trade and routes to the matching module. The pipeline is parameterised by finrl/config.py (training window, time intervals, technical-indicator list, brokerage parameters, model hyperparameters) and finrl/config_tickers.py (ticker universes such as the DOW 30).
| Step | Module | Responsibility |
|---|---|---|
| 1. Configure | finrl/config.py | Define time ranges, indicators, agent hyperparameters |
| 2. Train | finrl/train.py | Build StockTradingEnv, call DRLAgent.train_model, save checkpoints under trained_models/ |
| 3. Test | finrl/test.py | Replay a trained policy on the test set and emit account-value / action logs |
| 4. Trade / Backtest | finrl/trade.py, finrl/plot.py | Compute Sharpe ratio, cumulative return, and compare with MVO / DJIA benchmarks |
The trade.py step uses the same DRLAgent.DRL_prediction method to generate actions, then calls finrl/plot.py backtest_stats and backtest_plot for performance analytics. The 2026 tutorial mirrors this with a dedicated FinRL_StockTrading_2026_3_Backtest.py that compares agent returns against Mean-Variance Optimisation and DJIA — see the v0.3.8 release notes referenced in README.md.
Applications
The finrl/applications directory contains four task families referenced in finrl/README.md: stock_trading, cryptocurrency_trading, portfolio_allocation, and high_frequency_trading. Each application reuses the same three-layer structure and swaps in a domain-specific Env_* class from finrl/meta/.
Paper Trading (Alpaca)
Architecture
finrl/meta/paper_trading/alpaca.py implements an AlpacaPaperTrading class that authenticates against the Alpaca paper-trading API, streams intraday bars on a threaded schedule, and dispatches buy/sell orders to a DRL-trained policy checkpoint. The default universe and timestamps are typically derived from finrl/config_tickers.py and the indicators configured in finrl/config.py.
Known Threading Pitfalls
The community has surfaced two related defects in the order-submission path:
- Thread target called immediately (issue #1399): the original code used
Thread(target=self.submitOrder(...)), which invokessubmitOrdersynchronously and returnsNoneas the target. The corrected pattern isThread(target=self.submitOrder, args=(...)). - Unread response lists (issue #1414): the
respSOlists populated by each order thread are joined but never read, so submission errors or fills are silently dropped. Contributors are advised to inspectrespSOafterjoin()to surface failures.
Market-State Safety
Issue #1412 highlights that StockTradingEnv schedules trades via the timestamp alone and does not verify whether the exchange is actually open. This causes silent failures around holidays and DST transitions. A pre-trade market-state check (e.g., via a Headless Oracle–style signed manifest) has been proposed to close this gap.
Common Failure Modes
Beyond the Alpaca issues above, users frequently hit:
reset(seed=...)keyword error (#1013):StockTradingEnvdoes not accept aseedkeyword onresetwhen wrapping with newer Gymnasium (gymnasium>=0.28). Pin togym==0.21or useenv.reset()without a seed.- Off-policy logging crash (#1395): callbacks that read
model.rollout_bufferblow up for DDPG / TD3 / SAC, which usereplay_bufferinstead. Gate the callback onhasattr(model, "rollout_buffer"). - Shape mismatches (#222, #206): mismatches between feature count and price array length usually mean the technical-indicator list and the indicator column names in the CSV do not agree.
- Invalid
Normaldistribution (#696): NaNs in state arrays — usually from unscaled prices or missing rows in a single-stock environment — produce non-finitelocvalues. - Short selling (#1255): a long-only restriction is being introduced via an
allow_short_sellingflag onStockTradingEnv.
See Also
Data Processors and Technical Indicators— detailed coverage ofprocessor_yahoofinance.py,processor_alpaca.py,processor_ccxt.py, and indicator definitions.Market Environments—StockTradingEnv,CryptoTradingEnv, andPortfolioOptimizationEnvcontracts.Agents and Training— DRL algorithm adapters for ElegantRL, RLlib, and Stable Baselines 3.Hyperparameter Tuning—finrl/agents/stablebaselines3/tune_sb3.pyOptuna integration.
Source: https://github.com/AI4Finance-Foundation/FinRL / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 12 structured pitfall item(s), including 6 high/blocking item(s). Top priority: Runtime risk - Runtime risk requires verification.
1. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1395
2. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1414
3. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1412
4. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/671
5. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/AI4Finance-Foundation/FinRL/issues/1013
6. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: packet_text.keyword_scan | https://github.com/AI4Finance-Foundation/FinRL
7. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/AI4Finance-Foundation/FinRL
8. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL
9. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/AI4Finance-Foundation/FinRL
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/AI4Finance-Foundation/FinRL
11. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL
12. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/AI4Finance-Foundation/FinRL
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using FinRL with real data or production workflows.
- Google colab Stock_NeurIPS2018_SB3.ipynb StockTradingEnv.reset() got an - github / github_issue
- DDPG / off-policy algorithms fail due to rollout_buffer logging in FinRL - github / github_issue
- Question About Open-Sourcing the FinRL-DT Implementation - github / github_issue
- Feature: Chart pattern similarity as observation/state for RL agents - github / github_issue
- paper_trading/alpaca.py: submitOrder response list never read after thre - github / github_issue
- Add pre-trade market state verification to StockTradingEnv - github / github_issue
- Fix thread target invocation in paper_trading/alpaca.py (submitOrder cal - github / github_issue
- Is there a way to prevent the FinRL model from doing any Short selling - github / github_issue
- AttributeError when running "python main.py --mode=train" command - github / github_issue
- v0.3.8 - github / github_release
- Security or permission risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence