3rd Workshop on Maritime Computer Vision (MaCVi)

Challenges / USV-based Embedded Obstacle Segmentation

USV-based Embedded Obstacle Segmentation

Quick links: Dataset download Validate ONNX Submit Leaderboards Ask for help

Quick Start

  1. Download the LaRS dataset on the dataset page.
  2. Train your model on the LaRS training set.
  3. Export the model as an ONNX file.
  4. Submit the ONNX model through the upload page.

🏅 Prizes

The top team of this challenge stands to win $500 worth of Luxonis devices.

Overview

This challenge is an extension of the USV-based Obstacle Segmentation challenge. For an overview of the task and datasets refer to the challenge page.

The aim of this challenge is to develop obstacle segmentation methods suitable for deployment on embedded devices. For this reason your methods will be run, benchmarked and evaluated on a real-world device. The device is an upcoming next-gen device from Luxonis based on Robotic Vision Core 4 (RVC4).

Task

Create a semantic segmentation method that classifies the pixels in a given image into one of three classes: sky, water or obstacle. An obstacle is everything that the USV can crash into or that it should avoid (e.g. boats, swimmers, land, buoys). Refer to the main challenge page for more details.

Evaluation metrics

In the evaluation we will consider the same obstacle detection quality metrics as in the main challenge. In addition, we will also evaluate the throughput of the methods in terms of frames-per-second (FPS) processed on the evaluation device.

To be considered for the challenge, the method must run faster than a set threshold of 30 FPS with 384x768 input shape. The throughput will be evaluated in regular (balanced) mode.

To determine the winner of the challenge the aggregate Q (Quality) metric will be used. In case of a tie, FPS of the methods will be used to determine a winner (faster is better).

Participate

To participate in the challenge follow these steps:

  1. Download the LaRS dataset ( LaRS webpage).
  2. Train a semantic segmentation model on the LaRS training set. You can also use additional publicly available training data, but must disclose it during submission.
    • Note: the mmsegmentation-macvi may be a good starting point for the development of you model. The repository contains the scripts for training and inference on LaRS.
  3. Export your model as ONNX. See below for more details.
    • Note: You may use the tools/pytorch2onnx.py script in the mmsegmentation-macvi repository as a starting point.
  4. Validate your .onnx file to check for potential errors here.
    • We will run a shorter version of model conversion to make sure everything is OK. You will receive the results in the dashboard, along with potential errors.
    • This step is recommended to quickly identify errors, without having to wait for full evaluation.
    • The model validation process has no daily upload limit.
  5. Upload your .onnx file along with all the required information here.
    • After submission, your method will be run and evaluated on device. This may take some time (up to an hour) so be patient. Results of the evaluation will appear on your dashboard. The dashboard will also display an message in case of failed evaluation (hover over the error icon). You may evaluate at most one submission per day (per challenge track). Failed attempts do not count towards this limit.

Model Guidelines

Since hardware might present certain limitations, we present guidelines for model development. This section describes the allowed operations, model requirements, input definitions, good practices, and expected throughput and accuracy drop.

Allowed Operations

Before running the model on the device, it gets compiled and quantized to appropriate format. While the quantization to INT8 and part of compilation will be executed automatically, users are still required to perform the first step - conversion of the trained Pytorch model to ONNX.

Below is a short example of Pytorch to ONNX conversion, that would create a model.onnx with fixed input shape, which could be submitted to MaCVi:

Some general guidelines to ensure that your model will be exportable:

Feel free to refer to the official Pytorch documentation for more instructions on export, limitations, and good practices. Please note that custom operations are not supported.

Furthermore, there might be some additional limitations on the device itself, where certain ONNX operations might not be supported. Please follow this spreadsheet for un/supported ONNX operations. If a certain operation is missing, it is suggested to check whether the model compiles and executes correctly through a submission of an un-trained model.

Inputs and outputs of the model
Inputs

Because models can expect differently processed inputs and evaluation on the device does not expose the pre-processing options to the participants, it is important that model expects the following input:

Since images in the training set can have different sizes, they will be reshaped (using bilinear interpolation) to fit the above shape (with preserved aspect ratio), centered into the shape, and padded using mirror padding.

Example of Python code using PIL and Numpy to read and normalize the image:

Such images will be fed to the model on device, the outputs will be automatically evaluated in the same manner as the classic segmentation track. Note that example does not include the padding and resizing.

Outputs

The output should be:

Please note that as part of the postprocessing step, the predicted logits will be argmaxed to obtain segmentation mask, then cropped and upscaled (using nearest interpolation) to the original input shape.

Good Practices

While the model will be quantized to INT8, it is recommended to use FP32 during the training, as it is more stable and allows for better convergence. It is also recommended to use the same input shape during the training as it will be during inference.

Expected Throughput and Accuracy Drop

In the table bellow, we report the F1 detection score performance drop and FPS for several common segmentation methods. These benchmarks show what kind of performance drop and throughput to expect from your methods.

The models have been exported using the MMSegmentation toolbox. Input shape of 768x384 was used for all deployed models.

Variants:

Note: Most of the performance drop is not due to quantization, but due to using lower input resolution during inference. To improve the performance we suggest training specifically for the target resolution.

method variant F1 FPS
FCN (ResNet-50) orig 57.9 -
FCN (ResNet-50) orig (768x384) 52.8 (-5.2) -
FCN (ResNet-50) quantized 54.0 (-3.9) 19.7
FCN (ResNet-101) orig 63.4 -
FCN (ResNet-101) orig (768x384) 52.7 (-10.6) -
FCN (ResNet-101) quantized 53.8 (-9.6) 16.8
DeepLabv3+ (ResNet-101) orig 64.0 -
DeepLabv3+ (ResNet-101) orig (768x384) 58.0 (-6.0) -
DeepLabv3+ (ResNet-101) quantized 57.4 (-6.6) 16.6
BiSeNetv1 (ResNet-50) orig 42.8 -
BiSeNetv1 (ResNet-50) orig (768x384) 45.1 (+2.3) -
BiSeNetv1 (ResNet-50) quantized 45.6 (+2.8) 28.7
BiSeNetv2 (-) orig 54.7 -
BiSeNetv2 (-) orig (768x384) 42.9 (-11.8) -
BiSeNetv2 (-) quantized 46.0 (-8.7) 44.6
STDC1 (-) orig 61.8 -
STDC1 (-) orig (768x384) 48.5 (-13.3) -
STDC1 (-) quantized 47.9 (-13.9) 45.8
STDC2 (-) orig 64.3 -
STDC2 (-) orig (768x384) 50.8 (-13.5) -
STDC2 (-) quantized 49.9 (-14.4) 38.3
SegFormer (MiT-B2) orig 70.0 -
SegFormer (MiT-B2) orig (768x384) 61.8 (-8.2) -
SegFormer (MiT-B2) quantized 58.0 (-12.0) 15.0

Terms and Conditions

In case of any questions regarding the challenge datasets or submission, please join the MaCVi Support forum.