Open Source Ecosystems

ForecastFlowML: Scalable Machine Learning Forecasting with PySpark

ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.

With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset. Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.

Features

ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:

Scaleable and extensive time series feature engineering (lag, rolling mean/std, stockout, history length) with PySpark.
Parallel model training per group in the dataset with Pyspark Pandas UDFs.
Direct multi-step forecasting.
Built-in time based cross-validation.
Hyperparameter tuning for each group model with grid search.
Supports scikit-learn like libraries such as LightGBM or XGBoost.

Documentation

Reach out to our latest documentation here.

User Guides

What is ForecastFlowML?

Feature Engineering

Time Series Cross Validation

Grid Search

Feature Importance

Save/Load ForecastFlowML

Examples

Kaggle Walmart M5 Forecasting Competition (18th solution)

Retail Demand Forecasting

Installation

ForecastFlowML installation

You can install the package using the following command:

pip install forecastflowml

Check Java

Make sure you have installed Java 11. You can check whether you have Java or not with the following command:

java -version

Set PYSPARK_PYTHON

In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:

import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()

Package Rankings

Top 29.22% on Pypi.org

Badges

Extracted from project README