FactorLib is a Python library designed for efficient and scalable factor analysis and walk-forward optimization (WFO) in quantitative finance.
- Parallel Processing: Leverages Ray for parallel data processing, enabling faster factor creation and model training.
- Customizable Factors: Create custom factors by extending the
BaseFactorclass and implementing thegenerate_datamethod. - Walk-Forward Optimization: Perform WFO with various machine learning models (e.g., LightGBM, XGBoost) to evaluate factor performance.
- Portfolio Optimization: Supports multiple portfolio optimization methods, including Mean-Variance Optimization (MVO), Hierarchical Risk Parity (HRP), and Inverse Variance weighting.
- Performance Analysis: Generate comprehensive performance statistics and visualizations using QuantStats and SHAP.
Before installing FactorLib, ensure you have the following dependencies installed:
- pandas
- numpy
- scikit-learn
- scipy
- xgboost
- ray
- tqdm
- jupyter
- shap
- catboost
- lightgbm
- QuantStats
- matplotlib
- pyarrow
- fastparquet
- ipywidgets
- yfinance
- prettytable
To install these dependencies using pip:
pip install -r requirements.txt- Clone the repository:
git clone https://github.com/your_username_/Project-Name.git- Install the required packages:
pip install -r requirements.txtThe FactorLib codebase is organized into several modules, each serving a specific purpose:
- factorlib/base_factor.py: Provides the
BaseFactorclass, which serves as the foundation for creating custom factors. - factorlib/factor.py: Defines the
Factorclass, representing a single factor within the FactorModel. - factorlib/factor_model.py: Contains the
FactorModelclass, responsible for managing factors, model training, and WFO. - factorlib/stats.py: Implements the
Statisticsclass for performance analysis and reporting. - factorlib/types.py: Defines various enumerations and types used throughout the library.
- factorlib/utils/: Contains utility functions and modules for data handling, system operations, and datetime manipulations.
- scripts/data/cleaner.py: Provides scripts for cleaning and preprocessing raw data.
- system_test.py: Offers an example of how to use FactorLib for factor analysis and WFO.
To get started, you can follow these steps:
- Create Custom Factors: Extend the
BaseFactorclass and implement thegenerate_datamethod to define your factor logic. - Prepare Data: Ensure your data is formatted according to the requirements of the
Factorclass. - Build Factor Model: Instantiate a
FactorModeland add your custom factors using theadd_factormethod. - Walk-Forward Optimization: Call the
wfomethod on your FactorModel to perform WFO and evaluate factor performance. - Analyze Results: Use the
Statisticsclass to generate performance reports and visualizations.
The system_test.py script demonstrates a basic example of using FactorLib:
import pandas as pd
from datetime import datetime
from factorlib.factor import Factor
from factorlib.factor_model import FactorModel
from factorlib.types import PortOptOptions, ModelType
from factorlib.utils.system import get_raw_data_dir, get_experiments_dir
# ... (load data and create factors) ...
factor_model = FactorModel(name='test_00', tickers=tickers, interval=INTERVAL, model_type=ModelType.lightgbm)
# ... (add factors to the model) ...
stats = factor_model.wfo(returns,
train_interval=pd.DateOffset(years=5), train_freq='M', anchored=False,
start_date=datetime(2017, 1, 5), end_date=datetime(2022, 12, 20),
candidates=candidates,
save_dir=get_experiments_dir(), **kwargs,
port_opt=PortOptOptions.MeanVariance)This script loads sample data, creates factors, builds a factor model, performs WFO, and generates performance statistics.
Contributions to FactorLib are welcome! Please refer to the contribution guidelines for more information.