🧙 Scalable machine learning forecasting framework with Pyspark
MIT License
ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.
With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset. Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.
ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:
scikit-learn
like libraries such as LightGBM
or XGBoost
.Reach out to our latest documentation here.
Kaggle Walmart M5 Forecasting Competition (18th solution)
You can install the package using the following command:
pip install forecastflowml
Make sure you have installed Java 11. You can check whether you have Java or not with the following command:
java -version
In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:
import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()