Use the YOLO v3 (ONNX) model for object detection in C# using ML.Net
MIT License
Another case study, based on this YOLO v3 model is available here.
See here for YOLO v4 use.
Use the YOLO v3 algorithms for object detection in C# using ML.Net. We start with a Torch model, then converting it to ONNX format and use it in ML.Net.
This is a case study on a document layout YOLO trained model. The model can be found in the following Medium article: Object Detection — Document Layout Analysis Using Monk AI.
This is based on this article Object Detection — Document Layout Analysis Using Monk AI.
import os
import sys
from IPython.display import Image
sys.path.append("../Monk_Object_Detection/7_yolov3/lib")
from infer_detector import Infer
gtf = Infer()
f = open("dla_yolov3/classes.txt")
class_list = f.readlines()
f.close()
model_name = "yolov3"
weights = "dla_yolov3/dla_yolov3.pt"
gtf.Model(model_name, class_list, weights, use_gpu=False, input_size=(416, 416))
img_path = "test_square.jpg"
gtf.Predict(img_path, conf_thres=0.2, iou_thres=0.5)
Image(filename='output/test_square.jpg')
You need to set ONNX_EXPORT = True
in ...\Monk_Object_Detection\7_yolov3\lib\models.py
before loading the model.
We name the input layer image
and the 2 ouput layers classes
, bboxes
. This is not needed but helps the clarity.
import torch
import torchvision.models as models
dummy_input = torch.randn(1, 3, 416, 416) # Create the right input shape (e.g. for an image)
dummy_input = torch.nn.Sigmoid()(dummy_input) # limit between 0 and 1 (superfluous?)
torch.onnx.export(gtf.system_dict["local"]["model"],
dummy_input,
"dla_yolov3.onnx",
input_names=["image"],
output_names=["classes", "bboxes"],
opset_version=9)
The ONNX model can be viewed in Netron. Our model looks like this:
As per this article:
For an image of size 416 x 416, YOLO predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10,647 bounding boxes.
bboxes
output layer is of size [10,647 x 4]. This corresponds to 10,647 bounding boxes x 4 bounding box coordinates (x, y, h, w).classes
output layer is of size [10,647 x 18]. This corresponds to 10,647 bounding boxes x 18 classes (this model has only 18 classes).Hence, each bounding box has (4 + classes) = 22 features. The total number of prediction in this model is 22 x 10,647.
NB: The ONNX conversion removes 1 feature which is the objectness score, pc. The original model has (5 + classes) features for each bounding box. We will use the class probability as a proxy for the objectness score.
More information can be found in this article: YOLO v3 theory explained