Maritime Computer Vision Workshop @ CVPR 2026

Challenges / USV-based Embedded Obstacle Segmentation

USV-based Embedded Obstacle Segmentation

Quick links: Dataset download Validate ONNX Submit Leaderboards Ask for help

Quick Start

Download the LaRS dataset on the dataset page.
Train your model on the LaRS training set.
Export the model as an ONNX file.
Submit the ONNX model through the upload page.

🏅 Prizes

The top team of this challenge stands to win $500 worth of Luxonis devices.

Overview

MaCVi 2026's USV challenges feature the LaRS Dataset. LaRS focuses on scene diversity and covers a wide range of environments including inline waters such as lakes, canals and rivers. To perform well on LaRS, one needs to build a robust model that generalizes well to various situations. For the Embedded Semantic Segmentation track, your task is to develop a semantic segmentation method that classifies the pixels in a given input image into one of three classes: sky, water or obstacles. You may train your methods on the LaRS training set, which has been designed specifically for this use case. You may also use additional publicly available data to train your method. In this case, please disclose this during the submission process.

The aim of this challenge is to develop obstacle segmentation methods suitable for deployment on embedded devices. For this reason your methods will be run, benchmarked and evaluated on a real-world device. The device is an upcoming next-gen device from Luxonis based on Robotic Vision Core 4 (RVC4).

Task

Create a semantic segmentation method that classifies the pixels in a given image into one of three classes: sky, water or obstacle. An obstacle is everything that the USV can crash into or that it should avoid (e.g. boats, swimmers, land, buoys).

Dataset

LaRS consists of 4000+ USV-centric scenes captured in various aquatic domains. It includes per-pixel panoptic masks for water, sky and different types of obstacles. On a high level, obstacles are divided into i) dynamic obstacles, which are objects floating in the water (e.g. boats, buoys, swimmers) and ii) static obstacles, which are all remaining obstacle regions (shoreline, piers). Additionally, dynamic obstacles are categorized into 8 different obstacle classes: boat/ship, row boat, buoy, float, paddle board, swimmer, animal and other. More information >

This challenge is based on the semantic segmentation sub-track of LaRS: the annotations include semantic segmentation masks, where all obstacles are assigned into a single "obstacle" class.

Evaluation metrics

LaRS evaluation protocol is designed to score the predictions in a way meaningful for practical USV navigation. Methods are evaluated in terms of:

Water-edge segmentation (static obstacles) is evaluated by the prediction quality of the boundary between water and static obstacles in terms of accuracy (the per-pixel classification accuracy within a narrow belt around the ground-truth water-static-obstacle boundary - μ).
Dynamic obstacle detection is evaluated in terms of a number of true positive (TP), false positive (FP) and false negative (FN) detections and summarized by precision (Pr), recall (Re) and F1-score (F1). An obstacle is considered detected (TP) if the predicted segmentation coverage of the obstacle class inside the GT obstacle mask is sufficient (>70%). Predicted segmentation blobs outside GT obstacle masks are considered as FP detections.
Segmentation quality is evaluated in terms of mean IoU (intersection over union) between the predicted and ground-truth segmentation masks (mIoU).

Besides prediction accuracy, we will also evaluate the throughput of the methods in terms of frames-per-second (FPS) processed on the evaluation device.

To be considered for the challenge, the method must run faster than a set threshold of 30 FPS with 384x768 input shape. The throughput will be evaluated in regular (balanced) mode.

To determine the winner of the challenge, we use an aggregate metric Q (Quality) = mIoU x F1, combining aspects of general segmentation quality measured by the mIoU and detection quality measured by the F1 score.

In case of a tie, FPS of the methods will be used to determine a winner (faster is better).

Participate

To participate in the challenge follow these steps:

Download the LaRS dataset ( LaRS webpage).
Train a semantic segmentation model on the LaRS training set. You can also use additional publicly available training data, but must disclose it during submission.
- Note: the mmsegmentation-macvi may be a good starting point for the development of you model. The repository contains the scripts for training and inference on LaRS.
Export your model as ONNX. See below for more details.
- Note: You may use the tools/pytorch2onnx.py script in the mmsegmentation-macvi repository as a starting point.
Validate your .onnx file to check for potential errors here.
- We will run a shorter version of model conversion to make sure everything is OK. You will receive the results in the dashboard, along with potential errors.
- This step is recommended to quickly identify errors, without having to wait for full evaluation.
- The model validation process has no daily upload limit.
Upload your .onnx file along with all the required information here.
- After submission, your method will be run and evaluated on device. This may take some time (up to an hour) so be patient. Results of the evaluation will appear on your dashboard. The dashboard will also display an message in case of failed evaluation (hover over the error icon). You may evaluate at most one submission per day (per challenge track). Failed attempts do not count towards this limit.

Model Guidelines

Since hardware might present certain limitations, we present guidelines for model development. This section describes the allowed operations, model requirements, input definitions, good practices, and expected throughput and accuracy drop.

Allowed Operations

Before running the model on the device, it gets compiled and quantized to appropriate format. While the quantization to INT8 and part of compilation will be executed automatically, users are still required to perform the first step - conversion of the trained Pytorch model to ONNX.

Below is a short example of Pytorch to ONNX conversion, that would create a model.onnx with fixed input shape, which could be submitted to MaCVi:

Some general guidelines to ensure that your model will be exportable:

Refer to official documentation to see which operators are supported (PyTorch, ONNX).
Avoid custom operations and rely on torch operators.
Make sure your model graph is static (inference process should be the same for each image, input and intermediate tensor shapes are fixed).
Use opset version up to 13 during the export process.

Feel free to refer to the official Pytorch documentation for more instructions on export, limitations, and good practices. Please note that custom operations are not supported.

Furthermore, there might be some additional limitations on the device itself, where certain ONNX operations might not be supported. Please follow this spreadsheet for un/supported ONNX operations. If a certain operation is missing, it is suggested to check whether the model compiles and executes correctly through a submission of an un-trained model.

Inputs and outputs of the model

Inputs

Because models can expect differently processed inputs and evaluation on the device does not expose the pre-processing options to the participants, it is important that model expects the following input:

A single RGB image (scaled to [0-1] range before normalization) normalized by ImageNet mean and standard deviation:
- mean = [0.485, 0.456, 0.406]
- std = [0.229, 0.224, 0.225]
Input shape of the exported model must be 1x3x384x768 (N, C, H, W), while it can be arbitrary during the training.

Since images in the training set can have different sizes, they will be reshaped (using bilinear interpolation) to fit the above shape (with preserved aspect ratio), centered into the shape, and padded using mirror padding.

Example of Python code using PIL and Numpy to read and normalize the image:

Such images will be fed to the model on device, the outputs will be automatically evaluated in the same manner as the classic segmentation track. Note that example does not include the padding and resizing.

Outputs

The output should be:

Single output node
1x1x384x768 tensor of predictions (argmaxed logits).

Please note that as part of the postprocessing step, the predicted logits will be argmaxed to obtain segmentation mask, then cropped and upscaled (using nearest interpolation) to the original input shape.

Good Practices

While the model will be quantized to INT8, it is recommended to use FP32 during the training, as it is more stable and allows for better convergence. It is also recommended to use the same input shape during the training as it will be during inference.

Expected Throughput and Accuracy Drop

In the table bellow, we report the F1 detection score performance drop and FPS for several common segmentation methods. These benchmarks show what kind of performance drop and throughput to expect from your methods.

The models have been exported using the MMSegmentation toolbox. Input shape of 768x384 was used for all deployed models.

Variants:

orig: the original model using original input resolution (2048x1024)
orig (768x384): the original model running inference with target input resolution (768x384)
quantized: the final quantized model running on device, uses mirror padding

Note: Most of the performance drop is not due to quantization, but due to using lower input resolution during inference. To improve the performance we suggest training specifically for the target resolution.

method	variant	F1	FPS
FCN (ResNet-50)	orig	57.9	-
FCN (ResNet-50)	orig (768x384)	52.8 (-5.2)	-
FCN (ResNet-50)	quantized	54.0 (-3.9)	19.7
FCN (ResNet-101)	orig	63.4	-
FCN (ResNet-101)	orig (768x384)	52.7 (-10.6)	-
FCN (ResNet-101)	quantized	53.8 (-9.6)	16.8
DeepLabv3+ (ResNet-101)	orig	64.0	-
DeepLabv3+ (ResNet-101)	orig (768x384)	58.0 (-6.0)	-
DeepLabv3+ (ResNet-101)	quantized	57.4 (-6.6)	16.6
BiSeNetv1 (ResNet-50)	orig	42.8	-
BiSeNetv1 (ResNet-50)	orig (768x384)	45.1 (+2.3)	-
BiSeNetv1 (ResNet-50)	quantized	45.6 (+2.8)	28.7
BiSeNetv2 (-)	orig	54.7	-
BiSeNetv2 (-)	orig (768x384)	42.9 (-11.8)	-
BiSeNetv2 (-)	quantized	46.0 (-8.7)	44.6
STDC1 (-)	orig	61.8	-
STDC1 (-)	orig (768x384)	48.5 (-13.3)	-
STDC1 (-)	quantized	47.9 (-13.9)	45.8
STDC2 (-)	orig	64.3	-
STDC2 (-)	orig (768x384)	50.8 (-13.5)	-
STDC2 (-)	quantized	49.9 (-14.4)	38.3
SegFormer (MiT-B2)	orig	70.0	-
SegFormer (MiT-B2)	orig (768x384)	61.8 (-8.2)	-
SegFormer (MiT-B2)	quantized	58.0 (-12.0)	15.0

Terms and Conditions

Submissions must be made before the deadline as listed on the dates page. Submissions made after the deadline will not count towards the final results of the challenge.
Submissions are limited to one per day per challenge. Failed submissions do not count towards this limit.
The winner is determined by the F1 metric (FPS in case of tie).
You are allowed to use additional publicly available data for training but you must disclose them at the time of upload. This also applies to pre-training.
In order for your method to be considered for the winning positions and included in the challenge summary paper, you will be required to submit a short report describing your method. More information in regards to this will be released towards the end of the challenge.
Note that we (as organizers) may upload models for this challenge, BUT we do not compete for a winning position (i.e. our models do not count on the leaderboard and merely serve as references). Thus, if your method is worse (in any metric) than one of the organizer's, you are still encouraged to submit your method. Methods that were submitted as part of the MaCVi 2024 challenge will be marked on the leaderboards.

In case of any questions regarding the challenge datasets or submission, please join the MaCVi Support forum.