Various datascience tools bundled in a single container: TensorFlow with GPU support, Jupyter, IPython, Scoop, h5py, pandas, scikit, TFLearn, plotly...
MIT License
This container was created to support various experimentations on Datascience, mainly in the context of Kaggle competitions.
Bundled tools:
CPU only:
version: "3"
services:
datascience-tools:
image: flaviostutz/datascience-tools
ports:
- 8888:8888
- 6006:6006
volumes:
- /notebooks:/notebooks
environment:
- JUPYTER_TOKEN=flaviostutz
docker-compose up
GPU support for TensorFlow:
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
sudo apt-get update && sudo apt-get install -y --no-install-recommends cuda-drivers
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0/nvidia-docker_1.0.0-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
nvidia-docker run -d -v /root:/notebooks -v /root/input:/notebooks/input -v /root/output:/notebooks/output -p 8888:8888 -p 6006:6006 --name jupyter flaviostutz/datascience-tools:latest-gpu
If you wish this container to run automatically on host boot, add these lines to /etc/rc.local:
cd /root/datascience-tools/run ./boot.sh >> /var/log/boot-script
Access:
#!/bin/bash python test.py
docker build . -f Dockerfile
docker build . -f Dockerfile-gpu
A good practice is to store your notebook scripts in a git repository
Run datascience-tools container and map the volume "/notebooks", inside the container, to the path you cloned your git repository in your computer
You can edit/save/run the scripts from the web interface (http://localhost:8888) or directly with other tools on your computer. You can commit and push your code to the repository directly (no copy from/to container is needed because the volume is mapped)
version: "3"
services:
datascience-tools:
image: flaviostutz/datascience-tools
ports:
- 8888:8888
- 6006:6006
volumes:
- /Users/flaviostutz/Documents/development/flaviostutz/puzzler/notebooks:/notebooks
JUPYTER_TOKEN - token needed for the users to open Jupyter. defaults to '', so that no token or password will asked to the user
SPARK_MASTER - Spark master address. Used if you want to send jobs to an external Spark cluster and still control the whole job from Jupyter Notebook itself.