Maritime Computer Vision Workshop @ CVPR 2026

Challenges / Generalist Meta Challenge

Generalist Meta Challenge

Quick links: Dataset download Submit Leaderboard Ask for help

Overview

The Generalist Meta Challenge evaluates cross-task generalization across various maritime computer vision tasks. Participants are encouraged to develop models that can perform well on multiple tasks simultaneously or adapt quickly to new tasks within the maritime domain. Specifically, the current Generalist Meta challenge focuses on the active challenges (e.g., Vision-to-Chart Data Association, Thermal Object Detection Challenge, LaRS Panoptic Segmentation Challenge, Embedded Semantic Segmentation Challenge and Multimodal Semantic Segmentation Challenge).

Generalist Definition

A Maritime Generalist model is defined as a model capable of performing effectively across all sub-problems within the maritime domain. There are multiple approaches to developing such a generalist model, and the challenge allows flexibility in the chosen methodology. However, participants must clearly justify why their proposed approach qualifies as a generalist model. This requirement encourages participants to identify and articulate the most effective strategy for building a truly generalist solution for the maritime domain. The challenge outlines several acceptable strategies, but participants are not restricted to the approaches explicitly listed.

  1. No additional training: The same architecture is retained, and the model is repurposed for different tasks using prompt modifications only.
  2. Specialized maritime encoder with task-specific heads: A vision encoder specifically developed for the maritime domain is used, with a dedicated head attached for each problem domain. Note that general-purpose vision encoders (e.g., DINO) do not qualify under this definition, though they may qualify under the definition below. This approach may use up to 100% of the training data from each domain.
  3. General vision encoder with task-specific heads: A general-purpose vision encoder (e.g., DINO) is used, with a dedicated head attached for each problem domain. Under this approach, no more than 50% of the training data from each domain may be used.
  4. Distillation from multiple experts: The same overall architecture is maintained, but the model is trained through distillation from multiple expert models.

Task Description

Participants must develop and submit a method that competes in at least two currently active challenges. Each submission must clearly justify why the proposed approach qualifies as a generalist method. The report must also include a list of the challenges in which the method was evaluated. Submissions are limited to four double-column pages (excluding references). The top three ranked teams will be required to provide code for verification via GitHub. Private repositories are permitted; however, the code must be fully accessible to the organizers for validation.

Evaluation metrics

Motivation

Leaderboards differ in scale and not every model appears on every benchmark, making raw scores incomparable across them. This metric normalizes ranks to $[0, 1]$ and aggregates via the geometric mean, which penalizes any near-zero score heavily. This approach rewards broad consistency over narrow specialization. Models absent from a leaderboard are assigned a rank one below last place, which discourages selective participation.


Let there be $M$ leaderboards indexed by $i \in \{1, \ldots, M\}$. Leaderboard $i$ contains $N_i \geq 2$ ranked entries (rank 1 is best). Let models be indexed by $j$.

1) Rank Definition

First, we define the per-leaderboard rank $r_{ij}$ for model $j$ on leaderboard $i$ as: $$r_{ij} = \begin{cases} \text{rank of model } j \text{ on leaderboard } i & \text{if present} \\ N_i + 1 & \text{if missing} \end{cases}$$

2) Rank Normalization

We convert each rank to an inverted normalized score $s_{ij} \in [0, 1]$ such that: $$r_{ij} = 1 \;\mapsto\; s_{ij} = 1; \qquad r_{ij} = N_i \;\mapsto\; s_{ij} = 0; \qquad r_{ij} = N_i + 1 \;\mapsto\; s_{ij} = 0$$ The normalized score for each model $j$ in each leaderboard $i$ is calculated as: $$s_{ij} = \max\!\left(0,\ 1 - \frac{r_{ij} - 1}{N_i - 1}\right)$$

3) Consistency Score

The Consistency score $C_j$ is the geometric mean of per-leaderboard scores. To avoid geometric means collapsing to exactly 0 whenever any $s_{ij} = 0$, we calculate the log form of each score and apply a small floor $\varepsilon > 0$ (e.g., $10^{-12}$). $$C_j = \exp\!\left(\frac{1}{M} \sum_{i=1}^{M} \log\!\left(\max(\varepsilon,\ s_{ij})\right)\right)$$

Geometric rank aggregation strongly penalizes any leaderboard where the model scores near zero (e.g., bottom-ranked or missing). The score favors models that are consistently strong across leaderboards rather than dominating on only a few.

If you have any questions, please join the MaCVi Support forum.