Kaggle HubMap
Our approach is built on understanding the challenges behind the data. Our main contribution is the consideration of the link between healthy glomeruli and unhealthy ones by predicting both into two different classes. We incorporate several external datasets in our pipeline and manually annotated the two classes. Our model architecture is relatively simple, and the pipeline can be easily transferred to other tasks.
You can read more about our solution here. A more concise write-up is also available here.
The main
branch contains a cleaned and simplified version of our pipeline, that is enough to reproduce our solution.
Our pipeline achieves highly competitive performance on the task, because of the following aspects :
For the following reasons, our code is convenient to use, especially for researchers :
Clone the repository
[TODO : Requirements]
Download the data :
input
folderDataset A
images from data.mendeley.com in the input/extra/
folder.input/test/
folder.Prepare the data :
notebooks/Json to Mask.ipynb
:
ADD_FC
and ONLY_FC
parameters to generate labels for the healthy and unhealthy classes.SAVE_TIFF
parameter to save the external data as tiff files of half resolution.PLOT
parameter to visualize the masks.SAVE
parameter to save the masks as rle.notebooks/Image downscaling.ipynb
:
FACTOR
parameter to specify the downscaling factor. We recommend generating data of downscaling 2 and 4.NAME
parameter to specify which rle to downscale. Make sure to run the script for all the dataframes you want to use.SAVE_IMG
parameters to this extent.Train models using notebooks/Training.ipynb
DEBUG
parameter to launch the code in debug mode (single fold, no logging)Config
class. Feel free to experiment with the parameters, here are the main ones :
tile_size
: Tile sizereduce_factor
: Downscaling factoron_spot_sampling
: Probability to accept a random tile with in the datasetoverlap_factor
: Tile overlapping during inferenceselected_folds
: Folds to run computations for.encoder
: Encoder as defined in Segmentation Models PyTorch
decoder
: Decoders from Segmentation Models PyTorch
num_classes
: Number of classes. Keep it at 2 to use the healthy and unhealthy classesloss
: Loss function. We use the BCE but the lovasz is also interestingoptimizer
: Optimizer namebatch_size
: Training batch size, adapt the BATCH_SIZES
dictionary to your gpuval_bs
: Validation batch sizeepochs
: Number of training epochsiter_per_epoch
: Number of tiles to use per epochlr
: Learning rate. Will be decayed linearlywarmup_prop
: Proportion of steps to use for learning rate warmupmix_proba
: Probability to apply MixUp withmix_alpha
: Alpha parameter for MixUpuse_pl
: Probability to sample a tile from the pseudo-labeled imagesuse_external
: Probability to sample a tile from the external imagespl_path
: Path to pseudo labels generated by notebooks/Inference_test.ipynb
extra_path
: Path to extra labels generated by notebooks/Json to Mask.ipynb
(should not be changed)rle_path
: Path to train labels downscaled by notebooks/Image downscaling.ipynb
(should not be changed)Validate models with notebooks/Inference.ipynb
:
log_folder
parameter to specify the experiment.use_tta
parameter to specify whether to use test time augmentations.save
parameter to indicate whether to save predictions.save_all_tta
parameter to save predictions for each tta (takes a lot of disk space).global_threshold
parameter to tweak the threshold.Generate pseudo-labels with notebooks/Inference Test.ipynb
:
log_folder
parameter to specify the experiment.use_tta
parameter to speciy whether to use test time augmentations.save
parameter to indicate whether to save predictions.Visualize predictions : notebooks/Visualize Predictions.ipynb
name
, log_folder
and sub
parameters according to what you want to plot.If you wish to dive into the code, the repository naming should be straight-forward. Each function is documented. The structure is the following :
code
data
dataset.py # Torch datasets
transforms.py # Augmentations
inference
main_test.py # Inference for the test data
main.py # Inference for the train data
model_zoo
models.py # Model definition
training
lovasz.py # Lovasz loss implementation
main.py # k-fold and training main functions
meter.py # Meter for evaluation during training
mix.py # CutMix and MixUp
optim.py # Losses and optimizer handling
predict.py # Functions for prediction
train.py # Fitting a model
utils
logger.py # Logging utils
metrics.py # Metrics for the competition
plots.py # Plotting utils
rle.py # RLE encoding utils
torch.py # Torch utils
params.py # Main parameters