deep learning and scientific computing framework with native CPU and GPU backend for the Scala programming language
OTHER License
Lamp is a Scala library for deep learning and scientific computing. It features a native CPU and GPU backend and operates on off-heap memory.
Lamp is inspired by pytorch. The foundation of lamp is a JNI binding to ATen, the C++ tensor backend of pytorch (see here). As a consequence lamp uses fast CPU and GPU code and stores its data in off-heap memory.
Lamp is built both for Scala 2 and Scala 3.
Lamp provides CPU or GPU backed n-dimensional arrays and implements generic automatic reverse mode differentiation (also known as autograd, see e.g. this paper). Lamp may be used for scientific computing similarly to numpy, or to build neural networks.
It provides neural networks components:
This repository also hosts some other loosely related libraries.
Lamp depends on the JNI bindings in aten-scala which has cross compiled artifacts for Mac (both x86 and M1) and Linux. Mac has no GPU support. Your system has to have the libtorch 1.12.1 shared libraries in /usr/local/lib/
. See the following Dockerfile on how to do that.
In addition to the libtorch shared libraries:
lamp-core
depends on cats-effect and aten-scala
lamp-data
further depends on scribe and jsoniter-scala
The machine generated ATen JNI binding (aten-scala) exposes hundreds of tensor operations from libtorch. On top of those tensors lamp provides autograd for the operations needed to build neural networks.
There is substantial test coverage in terms of unit tests and a suite of end to end tests which compares lamp to PyTorch on 50 datasets. All gradient operations and neural network modules are tested for correctness using numeric differentiation, both on CPU and GPU. Nevertheless, advance with caution.
Add to build.sbt:
libraryDependencies += "io.github.pityka" %% "lamp-data" % "VERSION" // look at the github page for version
The following artifacts are published to Maven Central from this repository:
"io.github.pityka" %% "lamp-sten"
- provides the native tensor data type"io.github.pityka" %% "lamp-core"
- provides autograd and neural network components"io.github.pityka" %% "lamp-data"
- provides training loops and data loading facilities"io.github.pityka" %% "lamp-saddle"
- provides integration with saddle
"io.github.pityka" %% "lamp-knn"
- provides k nearest neighbor"io.github.pityka" %% "lamp-umap"
- UMAP implementation"io.github.pityka" %% "extratrees"
- extremely randomized trees implementation"io.github.pityka" %% "lamp-akka"
- helper utilities for distributed trainingAll artifacts are published for scala 2.13 and scala 3.
sbt test
will run a short test suite of unit tests.
Cuda tests are run separately with sbt cuda:test
. See test_cuda.sh
in the source tree about how to run this in a remote docker context. Some additional tests are run from test_slow.sh
.
All tests are executed with sbt alltest:test
. This runs all unit tests, all cuda tests, additional tests marked as slow, and a more extensive end-to-end benchmark against PyTorch itself on 50 datasets.
Examples for various tasks:
bash run_cifar.sh
runs the code in example-cifar100/
.bash run_timemachine.sh
runs the code in example-timemachine/
.bash run_arxiv.sh
runs the code in example-arxiv/
.bash run_cifar_dist1.sh
and bash run_cifar_dist2.sh
.example-bert
and example-autoregressivelm
.The project in folder example-autoregressivelm
is a fully working example of self supervised
pretraining of a medium sized language model similar to GPT-2. It can leverage multiple GPUs either
driven from the same process or distributed across processes.
First, one has to build the JNI binding to libtorch, then build lamp itself.
The JNI binding is hosted in the pityka/aten-scala git repository. Refer to the readme in that repository on how to build the JNI sources and publish them as a scala library.
Lamp itself is a pure Scala library and builds like any other Scala project.
Once aten-scala
is published to a local repository invoking sbt compile
will work.
If you modified the package name, version or organization in the aten-scala
build, then you have to adjust the build definition of lamp.
See the LICENSE file. Licensed under the MIT License.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.