Kaggle Data Science Bowl 2018
More stuff from us
This is a DWT-inspired solution to the Kaggle's 2018 DS Bowl I produced within approximately 1 week before the end of the compeititon.
UPDATE 2018-04-22 - my score was 114th. I guess they are cleaning the LB in the end.
UPDATE 2018-04-24 - found out why my model generalized poorly - I forgot to re-create optimizer
after unfreezing the encoder weights.
Most prominently it features a dockerized PyTorch implementation of approach similar to Deep Watershed Transform.
Since the target metric was highly unstable (average mAP on 0.5 - 0.95 thresholds) and the private LB contained data mostly not related to the train dataset, it's a bit difficult to evaluate code performance, but it's safe to say that:
Key take-aways:
Training
Inference
Clone the repository
git clone https://github.com/snakers4/ds_bowl_2018 .
This repository contains a Dockerfile used when training models
/dockerfiles/Dockerfile
- this is the main DockerfileBuild a Docker image
cd dockerfiles
docker build -t bowl_image .
Install the latest nvidia docker
Follow instructions from here. Please prefer nvidia-docker2 for more stable performance.
To test all works fine run:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
(IMPORTANT) Run docker container (IMPORTANT)
Unless you use this exact command (with --shm-size flag) (you can change ports and mounted volumes, of course), then the PyTorch generators WILL NOT WORK.
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it -v /path/to/cloned/repository:/home/keras/notebook -p 8888:8888 -p 6006:6006 --shm-size 16G bowl_image
nvidia-docker -it -v /path/to/cloned/repository:/home/keras/notebook -p 8888:8888 -p 6006:6006 --shm-size 8G aveysov
To start the stopped container
docker start -i YOUR_CONTAINER_ID
docker exec -it YOUR_CONTAINER_ID
data/
data\
already contains pickled train dataframes with meta-data (for convenience only)After all of your manipulations your directory should look like this (omitting csv files):
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── stage1_test <- A folder with stage1 test data
│ ├── stage2_test <- A folder with stage2 test data
│ ├── test_df_stage1_meta <- A pickled dataframe with stage1 test meta data
│ ├── train_df_stage1_meta <- A pickled dataframe with stage1 train meta data
│ └── stage1_train <- A folder with stage1 train data
│ ├─ f8e74d4006dd68c1dbe68df7be905835e00d8ba4916f3b18884509a15fdc0b55
│ │ ├── images
│ │ └── masks
...
│ └─ ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d19659189951f540f48
│
├── dockerfiles <- A folder with Dockerfiles
│
└── src <- Source code
You see the list of the available model presets in src/models/model_params.py
The best model according to my tests was Unet16 (Unet + Vgg16 pre-trained encoder)
If all is ok, then use the following command to train the model
docker exec -it YOUR_CONTAINER_ID
cd src
tensorboard --logdir='ds_bowl_2018/src/tb_logs --port=6006
via jupyter notebook console or via tmux + docker exec (model converges in 100-150 epochs)echo 'python3 train_energy.py \
--arch unet16_160_7_dc --epochs 150 --workers 10 \
--channels 7 --batch-size 12 --fold_num 0 \
--lr 1e-3 --optimizer adam \
--bce_weight 0.9 --dice_weight 0.1 --ths 0.5 \
--print-freq 1 --lognumber unet16_160_7_dc_ths5_energy_distance_gray_final \
--tensorboard True --tensorboard_images True --is_distance_transform True --is_boundaries True \
--freeze True \
python3 train_energy.py \
--arch unet16_160_7_dc --epochs 150 --workers 10 \
--channels 7 --batch-size 12 --fold_num 1 \
--lr 1e-3 --optimizer adam \
--bce_weight 0.9 --dice_weight 0.1 --ths 0.5 \
--print-freq 1 --lognumber unet16_160_7_dc_ths5_energy_distance_gray_final \
--tensorboard True --tensorboard_images True --is_distance_transform True --is_boundaries True \
--freeze True \' > train.sh
sh train.sh
docker exec -it YOUR_CONTAINER_ID
cd src
echo 'python3 train_energy.py \
--arch unet16_64_7_dc --channels 7 --batch-size 1 --ths 0.5 \
--lognumber unet16_64_7_dc_ths5_energy_distance_gray_longer_rerun \
--workers 0 --predict' > predict.sh
sh predict.sh
note that the lognumber
is the lognumber you specified when training
please check which fold is used in the prediction loop
You can also run evaluation-only scripts like this
python3 train_energy.py \
--evaluate \
--resume weights/unet16_160_7_dc_ths5_energy_distance_gray_final_fold2_best.pth.tar \
--arch unet16_160_7_dc --epochs 50 --workers 10 \
--channels 7 --fold_num 2 \
--ths 0.5 --is_distance_transform True --is_boundaries True \
--print-freq 10 --lognumber eval_validation --tensorboard_images True \
utils.watershed.energy_baseline
;utils.watershed
performed worse;src/train_energy_pad.py
is also available. It works, but produces inferior quality;Use these notebooks on your own risk!
src/bowl.ipynb
- general debugging notebook with new models / generators / etc