Downloads

Find the download links to several datasets here:

General Dataset and Submission information

Here, you can see the more details about the datasets and format requirements for some of the datasets necessary for your predictions to be properly evaluated. Note that you need to be registered first to participate in the benchmark. See the instructions for that in the FAQs.

SeaDronesSee Object Detection

You need so submit a json-file in COCO format. That is, a list of dictionaries, each containing a single prediction of the form

{ "image_id": 6503, "category_id": 4, "score": 0.6774558424949646, "bbox": [ 426.15203857421875, 563.6422119140625, 43.328399658203125, 18.97894287109375 ] }

The predictions are evaluated on AP50:0.05:95, AP50, AP75, AR1, AR10.

SeaDronesSee Object Detection v2

You need so submit a json-file in COCO format. That is, a list of dictionaries, each containing a single prediction of the form as for Object Detection above. Please note that there might be some errors in metadata values in the test set images (around IDs 8129 to 8143).

SeaDronesSee Single-Object Tracking

There are 80 testing video clips on which the performance is measured. The protocol is based on the implementation of PyTracking. Note that this is a short-term Single-Object Tracking task, meaning that clips only feature objects that are present in the video during the whole video and do not disappear and reappear.

You need to submit a zip-file containing exactly 80 text-files, each corresponding to the respective clip. Each text file has to be named j.txt where j is the number corresponding to the respective clip (1,...,80). Each text file has as many rows as its corresponding clip has frames. Each row has 4 comma separated numbers (x,y,w,h), where x is the left-most pixel value of the tracked object's bounding box, y the top-most pixel value and w and h the width and height of the bounding box in pixels.

Via the link for "Single-Object Tracking" above, you will find three json-files (SeaDronesSee_train.json, SeaDronesSee_val.json,SeaDronesSee_test.json). It is a dict of dicts, where for each track/clip number, you find the corresponding frames that need to be taken from the Multi-Object Tracking link.

For example, in the following you see the first track starting with the frame 000486.png, with corresponding path lake_constance_v2021_tracking/images/test/486.png, followed by frame 000487.png and so on. Afterwards, we have clip 2 starting with frame 000494.png and so on:

{"1": {"000486.png": "lake_constance_v2021_tracking/images/test/486.png", "000487.png": "lake_constance_v2021_tracking/images/test/487.png", "000488.png": "lake_constance_v2021_tracking/images/test/488.png",... "000636.png": "lake_constance_v2021_tracking/images/test/636.png"}, "2": {"000494.png": "lake_constance_v2021_tracking/images/test/494.png",...

Furthermore, you find three folders: train_annotations, val_annotations, test_annotations_first_frame. For the train and val case you find the corresponding annotations for the respective clip in the respective train or val set. Each clip has its own text file with each line corresponding to the bounding box for that frame. The test folder contains text files for each clip as well but only contains the bounding box ground truth for the very first frame and dummy values for the succeeding frames.

See also the compressed folder sample_submission.zip in the Single-Object Tracking nextcloud folder. This zip-archive could be uploaded right away but will naturally yield bad results.

The predictions are evaluated on precision and success numbers.

SeaDronesSee Multi-Object Tracking

Please see also the MOT competition as part of MaCVi at WACV 2023!
There are 22 video clips in the data on which you can test your trained tracker. The mapping from images to video clips can be done via the files 'instances_test_objects_in_water.json' and 'instances_test_swimmer.json', respectively. They can be found via the link above.
For example, '410.png' from the test set can be assigned to video clip 'DJI_0057.MP4' because its entry in the annotation file looks like this:

{'id': 410, 'file_name': '410.png', 'height': 2160, 'width': 3840, 'source': {'drone': 'mavic', 'folder_name': 'DJI_0057', 'video': 'DJI_0057.MP4', 'frame_no': 715}, 'video_id': 0, 'frame_index': 715, 'date_time': '2020-08-27T14:18:35.823800', 'meta': {'date_time': '2020-08-27T12:18:36', 'gps_latitude': 47.671949, 'gps_latitude_ref': 'N', 'gps_longitude': 9.269724, 'gps_longitude_ref': 'E', 'altitude': 8.599580615665955, 'gimbal_pitch': 45.4, 'compass_heading': 138.2, 'gimbal_heading': 140.9, 'speed': 0.6399828341528834, 'xspeed': -0.39998927134555207, 'yspeed': 0.39998927134555207, 'zspeed': 0.299991953509164}}

Multi-Spectral Object Detection Dataset (no submission)

The Multi-Spectral Object Detection Dataset features several hundred frames captured from the viewpoint of a UAV showing humans and boats. It included the ground truth bounding box annotations.

Obstacle Detection & Segmentation

The MODS Obstacle Detection and Segmentation Benchmark is geared towards autonomous boats applications and has two tracks: Obstacle Detection and Segmentation. Please find instructions on how to set up your predictions for upload here soon. A sample submission can be found here.

Synthetic SeaDronesSee (no submission)

The Synthetic SeaDronesSee dataset is a large scale synthetically generated dataset using the data generation tool as published in the corresponding paper. Please find information on the detailed nature of the dataset on the corresponding Github. We encourage participants to use this synthetic data to leverage the performance on the other challenge tracks.

Seagull - Sea Monitoring and Surveillance Dataset (no submission)

The Seagull dataset is advertised here as part of the upcoming WACV Workshop. It is a high-altitude UAV-based dataset aimed at promoting sea monitoring and surveillance. Please find instructions on download and dataset details on their webpages.

Boat-MNIST

This is a toy data set for the task of binary image classification. It aims at providing a simple hands-on benchmark to test small neural networks. There are the following two classes:

1 - if the image contains any watercraft instance including boats, ships, surfboards, ... ON the water

0 - all the rest, i.e. just water or anything on the land (could also be boats)

Naturally, there may be edge cases (e.g. boats at the verge of the water and the shore). As metrics, we employ the prediction accuracy (number of correctly predicted images divided by number of all images) and the number of parameters of the model. For this benchmark, you can upload your trained ONNX model to be ranked on the leaderboard. For that, please refer to this sample script. It trains a simple single-layer perceptron architecture on this data set upon saving and exporting the Pytorch model as an ONNX file. Make sure the exported model uses the transformation provided in this code, as this is the transformation used for the webserver evaluation.

LaRS

LaRS is a large and diverse panoptic (and semantic) segmentation dataset for obstacle detection in USVs. Please visit the LaRS webpage for more information. In order to submit the results of your semantic or panoptic segmentation method, prepare the predictions as follows.

Semantic Segmentation

Your method should classify each pixel in the image into one of three classes: water, obstacles or sky. The predictions should be saved in a single .png file (example) with the same name as the original image. The colors used for the classes should be as follows:

water: (41, 167, 224)
obstacles: (247, 195, 37)
sky: (90, 75, 164)

Place the predictions in a .zip file and upload it to the webserver. Make sure that the predictions are located in the root of the .zip file (i.e. there are no directories included in the .zip file)

Panoptic Segmentation

Your method should produce panoptic instance masks for stuff and thing classes. The predictions should be saved in a single .png file (example) with the same name as the original image. The .png file should follow the format of the ground-truth panoptic masks:

RGB values encode class and instance information:
- R encodes the predicted class ID
- G and B encode the predicted instance ID (instance_id = G * 256 + B)
Class IDs follow the same class IDs specified in the GT annotation file. These are:
- 0: VOID
- 1: [stuff] static obstacles
- 3: [stuff] water
- 5: [stuff] sky
- 11: [thing] boat/ship
- 12: [thing] row boat
- 13: [thing] paddle board
- 14: [thing] buoy
- 15: [thing] swimmer
- 16: [thing] animal
- 17: [thing] float
- 19: [thing] other
The order and values of instance IDs is not important as long as each instance is assigned a unique ID
A common practice is to set the instance IDs of stuff classes to 0

Place the predictions in a .zip file and upload it to the webserver. Make sure that the predictions are located in the root of the .zip file (i.e. there are no directories included in the .zip file)

Dataset

The dataset comprises about 3,000 images of maritime navigational aids, mainly featuring red and green buoy markers. Only the training set is provided, while the test set is withheld to benchmark model performance in the competition. For accurate distance measurement, ground truth values are derived using the haversine distance between the GPS coordinates of the camera in each frame and the mapped positions of the buoys.

The dataset adheres to the YOLO format, with images and labels stored in separate directories. Each image has an associated label file (.txt) containing bounding box information. Each line in a label file represents a bounding box in the format:

Class Id (0, since only Buoy Class)
Center-X (Normalized)
Center-Y (Normalized)
Width (Normalized)
Height (Normalized)
Distance (metric value [m])

Bounding box coordinates and dimensions are normalized, while the distance is given as a metric value in meters.