MarineDet

MarineDet: Towards Open-Marine Object Detection

Haixin Liang^{1 #} Ziqiang Zheng^{1 # 📧} Zeyu Ma² Sai-Kit Yeung¹

¹The Hong Kong University of Science and Technology ²University of Electronic Science and Technology of China
^#co-first author ^📧corresponding author

Paper Dataset (comming soon)

MarineDet dataset consisting of 821 marine-relative object categories to promote and measure open-marine object detection performance

Abstract

Marine object detection has gained prominence in marine research, driven by the pressing need to unravel oceanic mysteries and enhance our understanding of invaluable marine ecosystems. There is a profound requirement to efficiently and accurately identify and localize diverse and unseen marine entities within underwater imagery. The open-marine object detection (OMOD for short) is required to detect diverse and unseen marine objects, performing categorization and localization simultaneously. To achieve OMOD, we present MarineDet. We formulate a joint visual-text semantic space through pre-training and then perform marine-specific training to achieve in-air-to-marine knowledge transfer. Considering there is no specific dataset designed for OMOD, we construct a MarineDet dataset consisting of 821 marine-relative object categories to promote and measure OMOD performance. The experimental results demonstrate the superior performance of MarineDet over existing generalist and specialist object detection algorithms. To the best of our knowledge, we are the first to present OMOD, which holds a more valuable and practical setting for marine ecosystem monitoring and management. Our research not only pushes the boundaries of marine understanding but also offers a standard pipeline for OMOD.

Video

The framework overview

We aim to develop algorithms that can accurately detect, localize, and categorize objects within images, even when faced with previously unseen object categories. Let O represent the set of all possible object categories, where |O | → ∞, indicating an open-ended vocabulary. We detect the presence of objects within the image by generating bounding boxes. We then assign category labels C = {c_i } to the detected objects based on the similarity between the semantic textual embedding of c_i and the regional features identified by the bounding box, wshere c_i corresponds to the category label.

There are two main procedures: 1) pre-training for joint visual-text semantic space construction and 2) marine-specific training.

Qualitative Results

The qualitative comparison between different algorithms. The left part of the dashed line represents the results of seen “Classes” while the right part shows the results of unseen “Classes”.

Citation