Rapid large-scale fractional differencing with NVIDIA RAPIDS and GPU to minimize memory loss while making a time series stationary. 6x-400x speed up over CPU implementation.
MIT License
This is a GPU implementation of fractional differencing (we call it GFD). It allows rapid large-scale implementation of fractional differencing to minimize memory loss while achieving stationary for time series data.
Easily run the whole tutorial in a self-contained Jupyter Notebook on Google Colaboratory by pressing the button above. The whole process of including pulling all data, dependencies and running the code for GFD is contained in the notebook, allowing you to run this notebook as is.
Number of data points and time taken in seconds. You can easily reach such similar multipliers on Google Colab or on more powerful machines via GCP, AWS or your local servers/machines.
**** | 100k | 1m | 10m | 100m |
---|---|---|---|---|
GCP 8x vCPUs | 9.18 | 89.62 | 891.24 | 9803.11 |
GCP 1x T4 GPU | 1.44 | 1.33 | 3.75 | 29.88 |
GCP 1x V100 GPU | 0.93 | 1.07 | 3.17 | 23.81 |
Speed-up 1x T4 vs 8x vCPUs | 6.38x | 67.38x | 237.66x | 328.08 |
Speed-up 1x V100 vs 8x vCPUs | 9.87x | 83.76x | 281.15x | 411.72x |
Full credits to NVIDIA where they built on our work and further speed things up resulting in almost 10000x speed-up over a CPU implementation. You can find the more complex and less intuitive but highly performance version of GFD by NVIDIA in this notebook.
We've created a simple function in the notebook, pass your Pandas dataframe into the function and it will return fractionally differenced time series dataframe.
d
: fractional differencing value, 0 means no differencing, above 1 means integer differencing, and anything between 0 to 1 is fractional differencing.floor
: minimum value to ignore for fixed window fractional differencing.df_raw
) is required to have an index such that it's from lag k (oldest time) to lag 0 (latest time) from top to the bottom of the dataframe accordingly for this function to work appropriately.
gfd, weights = frac_diff_gpu(df_raw, d=0.5, floor=5e-5)
fd, weights = frac_diff(df_raw, d=0.5, floor=5e-5)
The next release will include multiple 1D blocks in a 1D grid instead of a single 1D block of 518/1024 threads. This will help users understand multiple blocks vs a single block.
Beyond the next release, we'll be moving to explain the use of more than 1 dimension blocks/grids.
If you use the code, please cite using this link alongside Prado/Hosking papers.