Maritime Computer Vision Workshop @ CVPR 2026

Challenges / Multimodal Semantic Segmentation Challenge

Multimodal Semantic Segmentation Challenge

Dataset paper Dataset Leaderboards

Overview

Using multiple different sensors in a unified model can enable or improve scene interpretation under difficult circumstances, such as low-light or rain.

Task

The task is to produce semantic labels for the given sensor data, where the labels are as follows:

Dataset

The multimodal semantic segmentation challenge is based on the MULTIAQUA dataset, with a subset of the sensors specifically suited for low-light perception. The dataset contains synchronized and aligned data from RGB camera, LIDAR and thermal (FLIR) camera. The RGB data is structured as [HxWx3] matrices, while the thermal images are provided as [HxW] arrays. The lidar data is not structured upfront, but contains raw and processed values in a numpy array. The LIDAR values provided for each point are the X,Y,Z 3D coordinates, the distance d to the camera, the point's reflectivity r, and the coordinates of the point's 2D projection onto the RGB image plane x,y. The thermal images and LIDAR points are pre-processed and projected to the image plane of the RGB images. The specific way to use the thermal and LIDAR data is not specified, the scaling, channels and pre-processing are left to the participants' discretion.

Splits

The dataset contains a training and a validation set of multimodal data, along with corresponding semantic labels. The train and validation sets were captured in daytime under different weather conditions and on different waterways. The test set was captured during nighttime and serves as a challenging use case where all three modalities must be used. The ground truth annotations for the test set are not provided to the participants.

Participate

The models must be structured so they are able to process all three modalities, and additionally, the models must process all three modalities.

The challenge is structured as follows: the participants receive data that was captured during daytime as well as the corresponding ground truth labels. This will serve as the training data for the challenge. Using pretrained models and other datasets is allowed, but the data used in training the model must be stated in the submission report.

The final evaluation will be performed on the MaCVi server on the nighttime dataset. The main metric of the challenge is the mIoU on the nighttime test set. The final structure of leaderboard metrics will be published when the leaderboard opens. The participants will have to upload their model's predictions on the validation and test sets, then the performance metrics will be calculated on the server and displayed on the leaderboard.