High performance model preprocessing library on PyTorch
BSD-3-CLAUSE License
Bot releases are visible (Hide)
We are excited to release the very first Beta version of TorchArrow! TorchArrow is a machine learning preprocessing library over batch data, providing performant and Pandas-style easy-to-use API for model development.
TorchArrow provides a Python DataFrame that allows extensible UDFs with Velox, with the following features:
In this release we are supporting install via PYPI: pip install torcharrow
.
You can find the API documentation here.
This 10 minutes tutorial provides a short introduction to TorchArrow, and you can also try it in this Colab.
You can find the example about integrating a TorchRec based training loop utilizing TorchArrow's on-the-fly preprocessing here. More examples are coming soon!
We hope to continue to expand the library, harden API, and gather feedback to enable future releases. Stay tuned!
TorchArrow is currently in the Beta stage and does not have a stable release. The API may change based on user feedback or performance. We are committed to bring this library to stable release, but future changes may not be completely backward compatible. If you have suggestions on the API or use cases you'd like to be covered, please open a GitHub issue. We'd love to hear thoughts and feedback.