2nd Workshop on Maritime Computer Vision (MaCVi)
MaCVi 2024 Challenges have concluded. Check the results on the leaderboards. Thank you for participating!
SeaDronesSee Multi-Object Tracking with Reidentification
Similar to last MaCVi iteration, for this challenge, there will be prizes for the best team(s). Stay tuned for updates!
Quick links:
Dataset download
Submit
Leaderboard
Ask for help
TLDR
Download the SeaDronesSee Multi-Object Tracking dataset on the
dataset
page. Upload a json-file with predictions on the test set on the
upload page.
Overview
The SeaDronesSee benchmark was recently
published in WACV 2022. The goal of this benchmark is to
advance computer vision algorithms in maritime search and rescue missions. It is
aimed at detecting humans, boats and other objects in open
water. See the
Explore page for some annotation examples.
The task of airborne multi-object tracking in the maritime domain is far from solved. The small sizes of objects
in conjunction with hardly visible objects due to waves and sun reflections make the detection as well as tracking
hard. What is more, gimbal movement and altitude change cause objects to move quickly within the video. The
occasional
partial occlusion of objects are on top of the other challenges.
A novelty in this challenge is the extension to long-term tracking. That is, you need to reidentify objects that went missing and associate all objects uniquely for the whole video clip. This is particularly challenging for boats and swimmers that are hardly distinguishable. However, there is metadata accompanied with every frame. It tells you the altitude, viewing angle and other useful information. This might be helpful for associating objects correctly over time.
We provide tracking id labels alongside the bounding box labels for
several videos where the task is to track swimmers and
boats.
Task
Create a multi-object tracker that, given a video, outputs bounding boxes and tracking ids for every object
(boats, swimmer, floater). We group all classes into a
single class. Also see the
dataset
page for an
overview. This task is similar to the MOT-All-Objects-In-Water benchmark introduced in the original paper with minor changes in the annotation file.
This is a long-term tracking task, i.e. objects that disappear from the scene need to be re-identified if they reapper in the same video clip.
Dataset
We provide 21 clips in the train set, 17 clips in the validation set and 19 clips in the test set with a total of 54,105
frames and
403,192 annotated instances.
Note that the video clip ids in the annotations files are not necessarily increasing
integers (some integers are missing).
Furthermore, there are some images having a video id that is not contained in the train,val and set annotation
files. However, this applies to few images only.
Each image has precise meta data labels about altitude, angles of the UAV and the gimbal, GPS, and more.
Note that the train and val set do not contain long-term tracking labels (objects that went missing are assigned new ids when they reappear.) Only the test set contains these annotations, but is hidden. On the server, the long-term annotation files are used to evaluate your tracker.
Please contact me any time if you find
some irregularities (benjamin.kiefer@uni-tuebingen.de).
Evaluation Metrics
We evaluate your predictions on HOTA, MOTA, IDF1, MOTP, MT, ML, FP, FN, Recall, Precision, ID Switches, Frag. The
determining
metric for winning is HOTA. In case of tie, MOTA is the tiebreaker.
Furthermore, we require every participant to submit information on the speed of their method measured in frames per
second walltime.
Please also indicate the hardware that you used. Lastly, you should indicate which data sets (also for pretraining) you
used during training
and whether you used meta data.
Participate
In order to participate, you can perform the following steps:
- Download the dataset SeaDronesSee Multi-Object Tracking on the dataset page.
- Visualize the bounding boxes using these
Python scripts or Google
colabs. You should
adapt the paths for that.
- Train a multi-object tracker of your choice on the dataset.
We also provide public detections that you can use so that
you do not need to train your own detector. It's from a Yolov7 model trained on SeaDronesSee-MOT train set for 8
epochs with a AP of roughly 0.5. For reference, the same model (except for the number of class outputs) has a AP
of 0.4181 on Object Detection v2. So, it's not the best, but solid.
-
You need to create a json-prediction file for upload. In the following, we will illustrate the structue of the
json based on an example:
See the MOT_example_submission.json. You can load it in Python via
import json
file=open('MOT_example_submission.json','r')
data=json.load(file)
This yields you a list where you find that this list is of length = number of frames in entire test set (all
videos), i.e. 18,253. You can compare that to the length of the list in the corresponding test-json on
Nextcloud.
The frames of the videos should be ordered according to the following list:
- DJI_0001.mov
- DJI_0051.MP4
- DJI_0065.MP4
- DJI_0001_d3.mov
- DJI_0039.MP4
- DJI_0003.mov
- DJI_0064.MP4
- DJI_0069.MP4
- DJI_0011_d3.mov
- DJI_0057.MP4
- DJI_0032.MP4
- DJI_0001.MOV
- DJI_0010_d3.mov
- DJI_0063.MP4
- DJI_0059.MP4
- DJI_0006_d3.mov
- DJI_0055.MP4
- DJI_0041.MP4
- DJI_0038.MP4
Also find the respective information on the video names in the test set json-annotation. Note the difference
between DJI_0001.MOV and DJI_0001.mov. Furthermore note, that the number of videos is 19, not 22 or 20!
Each entry in the above list contains a single list. This list contains a single list or multiple lists.
If you found no object on that frame, then there is only a single empty list, so that the entry is:
[[]]
For example, in the sample submission the first frame does not contain any prediction, i.e.
data[0]=[[]]
If you found objects, then there is one or more lists contained. For example, in the sample submission the 101th
frame contains three predicted objects:
data[100]=[[[1.0, 473.14544677734375, 288.2177429199219, 595.2658081054688, 410.7237548828125, 0.9918640851974487],
[4.0, 3776.576171875, 331.8674621582031, 3847.074951171875, 403.09637451171875, 0.7565602660179138],
[3.0, 2151.9912109375, 24.94463348388672, 2346.65087890625, 79.3317642211914, 0.5624877214431763]]]
Each object is itself encoded as a list. For example, see the first object in the 101th frame:
data[100][0][0]=[1.0, 473.14544677734375, 288.2177429199219, 595.2658081054688, 410.7237548828125, 0.9918640851974487]
The numbers mean the following:
object id,bbox_left,bbox_top,bbox_right,bbox_bottom,confidence
The object id must be an integer. The confidence should be a float between 0 and 1.
The coordinates should be self-explanatory.
Let us know if you need further information.
- Upload your json-file along with all the required information in the form here. You need
to register first.
- If you provided a valid file and sufficient information, you should see your method on the leaderboard. Now,
you can reiterate. You may upload
at most once per day per challenge. Note that the upload can take a few minutes.
Terms and Conditions
- Submissions must be made before the deadline as listed on the
dates page
- You may submit at most once per day per challenge
- The winners are determined by the HOTA (tiebreaker MOTA) metric
- You are allowed to use any publicly available data for training but you must list them at the time of upload.
This also applies to pretraining.
- You may use our public detections to use for your tracker
but you may also train your own detector
- Note that we (as organizers) may upload models for this challenge BUT we do not compete for a winning position
(i.e. our models do not count on the leaderboard and merely serve as references). Thus, if your method is worse
(in any metric)
than one of the organizer's, you are still encouraged to submit your method as you might win.
In case of any questions regarding the challenge modalities or uploads, please direct them to Benjamin Kiefer.