Find the download links to several datasets here:
Here, you can see the more details about the datasets and format requirements for some of the datasets necessary for your predictions to be properly evaluated. Note that you need to be registered first to participate in the benchmark. See the instructions for that in the FAQs.
You need so submit a json-file in COCO format. That is, a list of dictionaries, each containing a single prediction of the form
{ "image_id": 6503, "category_id": 4, "score": 0.6774558424949646, "bbox": [ 426.15203857421875, 563.6422119140625, 43.328399658203125, 18.97894287109375 ] }
The predictions are evaluated on AP50:0.05:95, AP50, AP75, AR1, AR10.
You need so submit a json-file in COCO format. That is, a list of dictionaries, each containing a single prediction of the form as for Object Detection above. Please note that there might be some errors in metadata values in the test set images (around IDs 8129 to 8143).
There are 80 testing video clips on which the performance is measured. The protocol is based on the implementation of PyTracking. Note that this is a short-term Single-Object Tracking task, meaning that clips only feature objects that are present in the video during the whole video and do not disappear and reappear.
You need to submit a zip-file containing exactly 80 text-files, each corresponding to the respective clip. Each text file has to be named j.txt where j is the number corresponding to the respective clip (1,...,80). Each text file has as many rows as its corresponding clip has frames. Each row has 4 comma separated numbers (x,y,w,h), where x is the left-most pixel value of the tracked object's bounding box, y the top-most pixel value and w and h the width and height of the bounding box in pixels.
Via the link for "Single-Object Tracking" above, you will find three json-files (SeaDronesSee_train.json, SeaDronesSee_val.json,SeaDronesSee_test.json). It is a dict of dicts, where for each track/clip number, you find the corresponding frames that need to be taken from the Multi-Object Tracking link.
For example, in the following you see the first track starting with the frame 000486.png, with corresponding path lake_constance_v2021_tracking/images/test/486.png, followed by frame 000487.png and so on. Afterwards, we have clip 2 starting with frame 000494.png and so on:
{"1": {"000486.png": "lake_constance_v2021_tracking/images/test/486.png", "000487.png": "lake_constance_v2021_tracking/images/test/487.png", "000488.png": "lake_constance_v2021_tracking/images/test/488.png",... "000636.png": "lake_constance_v2021_tracking/images/test/636.png"}, "2": {"000494.png": "lake_constance_v2021_tracking/images/test/494.png",...
Furthermore, you find three folders: train_annotations, val_annotations, test_annotations_first_frame. For the train and val case you find the corresponding annotations for the respective clip in the respective train or val set. Each clip has its own text file with each line corresponding to the bounding box for that frame. The test folder contains text files for each clip as well but only contains the bounding box ground truth for the very first frame and dummy values for the succeeding frames.
See also the compressed folder sample_submission.zip in the Single-Object Tracking nextcloud folder. This zip-archive could be uploaded right away but will naturally yield bad results.
The predictions are evaluated on precision and success numbers.
Please see also the MOT competition as part of MaCVi at WACV 2023!
There are 22 video clips in the data on which you can test your trained tracker. The mapping from images to video clips can be done via the files 'instances_test_objects_in_water.json' and 'instances_test_swimmer.json', respectively. They can be found via the link above.
For example, '410.png' from the test set can be assigned to video clip 'DJI_0057.MP4' because its entry in the annotation file looks like this:
{'id': 410, 'file_name': '410.png', 'height': 2160, 'width': 3840, 'source': {'drone': 'mavic', 'folder_name': 'DJI_0057', 'video': 'DJI_0057.MP4', 'frame_no': 715}, 'video_id': 0, 'frame_index': 715, 'date_time': '2020-08-27T14:18:35.823800', 'meta': {'date_time': '2020-08-27T12:18:36', 'gps_latitude': 47.671949, 'gps_latitude_ref': 'N', 'gps_longitude': 9.269724, 'gps_longitude_ref': 'E', 'altitude': 8.599580615665955, 'gimbal_pitch': 45.4, 'compass_heading': 138.2, 'gimbal_heading': 140.9, 'speed': 0.6399828341528834, 'xspeed': -0.39998927134555207, 'yspeed': 0.39998927134555207, 'zspeed': 0.299991953509164}}
This is a toy data set for the task of binary image classification. It aims at providing a simple hands-on benchmark to test small neural networks. There are the following two classes:
1 - if the image contains any watercraft instance including boats, ships, surfboards, ... ON the water
0 - all the rest, i.e. just water or anything on the land (could also be boats)
Naturally, there may be edge cases (e.g. boats at the verge of the water and the shore). As metrics, we employ the prediction accuracy (number of correctly predicted images divided by number of all images) and the number of parameters of the model. For this benchmark, you can upload your trained ONNX model to be ranked on the leaderboard. For that, please refer to this sample script. It trains a simple single-layer perceptron architecture on this data set upon saving and exporting the Pytorch model as an ONNX file. Make sure the exported model uses the transformation provided in this code, as this is the transformation used for the webserver evaluation.LaRS is a large and diverse panoptic (and semantic) segmentation dataset for obstacle detection in USVs. Please visit the LaRS webpage for more information. In order to submit the results of your semantic or panoptic segmentation method, prepare the predictions as follows.
Your method should classify each pixel in the image into one of three classes: water, obstacles or sky. The predictions should be saved in a single .png file (example) with the same name as the original image. The colors used for the classes should be as follows:
water: (41, 167, 224)
obstacles: (247, 195, 37)
sky: (90, 75, 164)
Place the predictions in a .zip file and upload it to the webserver. Make sure that the predictions are located in the root of the .zip file (i.e. there are no directories included in the .zip file)
Your method should produce panoptic instance masks for stuff and thing classes. The predictions should be saved in a single .png file (example) with the same name as the original image. The .png file should follow the format of the ground-truth panoptic masks:
instance_id = G * 256 + B
)0
: VOID1
: [stuff] static obstacles3
: [stuff] water5
: [stuff] sky11
: [thing] boat/ship12
: [thing] row boat13
: [thing] paddle board14
: [thing] buoy15
: [thing] swimmer16
: [thing] animal17
: [thing] float19
: [thing] otherPlace the predictions in a .zip file and upload it to the webserver. Make sure that the predictions are located in the root of the .zip file (i.e. there are no directories included in the .zip file)
The dataset comprises about 3,000 images of maritime navigational aids, mainly featuring red and green buoy markers. Only the training set is provided, while the test set is withheld to
benchmark model performance in the competition.
For accurate distance measurement, ground truth values are derived using the haversine distance between the GPS coordinates of the camera in each frame and the mapped positions of the buoys.
The dataset adheres to the YOLO format, with images and labels stored in separate directories. Each image has an associated label file (.txt) containing bounding box information. Each line in a
label file represents a bounding box in the format: