基于Mask R-CNN改进模型的深海多金属结核图像分割方法

翁泽邦, 李小虎, 李洁, 李正刚, 王浩, 朱志敏, 孟兴伟, 李怀明

海洋学研究 ›› 2025, Vol. 43 ›› Issue (3) : 32-39.

PDF(11174 KB)
PDF(11174 KB)
海洋学研究 ›› 2025, Vol. 43 ›› Issue (3) : 32-39. DOI: 10.3969/j.issn.1001-909X.2025.03.004
研究论文

基于Mask R-CNN改进模型的深海多金属结核图像分割方法

作者信息 +

Deep-sea polymetallic nodule image recognition method based on an improved Mask R-CNN model

Author information +
文章历史 +

摘要

在深海多金属结核光学图像分割中,面临着图像对比度低、目标小和边界模糊等问题。本研究构建了一种引入动态稀疏卷积(dynamic sparse convolution,DSConv)和无参数注意力模块(simple parameter-free attention module,SimAM)的改进Mask R-CNN(mask region-based convolutional neural network)模型,对深海图像进行多金属结核目标物识别和分割。引入SimAM有效抑制了沉积物背景对结核识别的干扰;引入DSConv有效缓解了结核边界模糊问题;同时引入两个模块的改进模型,图像分割准确率为91.5%、精确率为78.0%、召回率为75.1%、交并比为69.4%。将改进模型与原始模型应用在实际测线上发现,海底结核覆盖率的识别结果中,误差低于5%的数据占比从原始模型的57%提升至改进模型的77%。本研究可为深海多金属结核覆盖率计算提供可靠的技术方案,其模块化设计也可拓展至其他目标识别、图像分割领域。

Abstract

Optical survey and evaluation of deep-sea polymetallic nodules face challenges such as low contrast, small object detection, and boundary ambiguity. This study proposes an improved Mask R-CNN model incorporating dynamic sparse convolution (DSConv) and simple parameter-free attention module (SimAM) for nodule image segmentation. SimAM effectively suppresses sediment background interference, while DSConv alleviates boundary blurring. The combined model achieves an accuracy of 91.5%, precision of 78.0%, recall of 75.1%, and IoU of 69.4%. When applying the improved model and the original model to the actual survey lines, it was found that in the identification results of the seabed nodules coverage rate, the proportion of data with an error less than 5%, increased from 57% of the original model to 77% of the improved model. This research can provide a reliable technical solution for the calculation of deep-sea polymetallic nodule coverage rate, and its modular design can also be extended to other fields of target recognition and image segmentation.

关键词

多金属结核 / 图像分割 / Mask R-CNN模型 / 覆盖率 / 注意力机制 / 动态稀疏卷积

Key words

polymetallic nodules / image segmentation / Mask R-CNN / coverage rate / SimAM / DSConv

引用本文

导出引用
翁泽邦, 李小虎, 李洁, . 基于Mask R-CNN改进模型的深海多金属结核图像分割方法[J]. 海洋学研究. 2025, 43(3): 32-39 https://doi.org/10.3969/j.issn.1001-909X.2025.03.004
WENG Zebang, LI Xiaohu, LI Jie, et al. Deep-sea polymetallic nodule image recognition method based on an improved Mask R-CNN model[J]. Journal of Marine Sciences. 2025, 43(3): 32-39 https://doi.org/10.3969/j.issn.1001-909X.2025.03.004
中图分类号: P744   

参考文献

[1]
HEIN J R, KOSCHINSKY A, KUHN T. Deep-ocean poly-metallic nodules as a resource for critical materials[J]. Nature Reviews Earth & Environment, 2020, 1(3): 158-169.
[2]
BALARAM V, MATHUR R, BANAKAR V K, et al. Determination of the platinum-group elements (PGE) and gold (Au) in the manganese nodule reference samples by nickel sulfide fire-assay and Te coprecipitation with ICP-MS[J]. Indian Journal of Marine Sciences, 2006, 35(1): 7-16.
[3]
SCHOENING T, JONES D O B, GREINERT J. Compact-morphology-based poly-metallic nodule delineation[J]. Scientific Reports, 2017, 7: 13338.
Poly-metallic nodules are a marine resource considered for deep sea mining. Assessing nodule abundance is of interest for mining companies and to monitor potential environmental impact. Optical seafloor imaging allows quantifying poly-metallic nodule abundance at spatial scales from centimetres to square kilometres. Towed cameras and diving robots acquire high-resolution imagery that allow detecting individual nodules and measure their sizes. Spatial abundance statistics can be computed from these size measurements, providing e.g. seafloor coverage in percent and the nodule size distribution. Detecting nodules requires segmentation of nodule pixels from pixels showing sediment background. Semi-supervised pattern recognition has been proposed to automate this task. Existing nodule segmentation algorithms employ machine learning that trains a classifier to segment the nodules in a high-dimensional feature space. Here, a rapid nodule segmentation algorithm is presented. It omits computation-intense feature-based classification and employs image processing only. It exploits a nodule compactness heuristic to delineate individual nodules. Complex machine learning methods are avoided to keep the algorithm simple and fast. The algorithm has successfully been applied to different image datasets. These data sets were acquired by different cameras, camera platforms and in varying illumination conditions. Their successful analysis shows the broad applicability of the proposed method.
[4]
TOMCZAK A, KOGUT T, KABAŁA K, et al. Automated estimation of offshore polymetallic nodule abundance based on seafloor imagery using deep learning[J]. Science of the Total Environment, 2024, 956: 177225.
[5]
SONG W, WANG H L, ZHANG X P, et al. Deep-sea nodule mineral image segmentation algorithm based on Pix2PixHD[J]. Computers, Materials & Continua, 2022, 73(1): 1449-1462.
[6]
WANG H L, DONG L H, SONG W, et al. Improved U-net-based novel segmentation algorithm for underwater mineral image[J]. Intelligent Automation & Soft Computing, 2022, 32(3): 1573-1586.
[7]
LIU L P, LI X, YANG J M, et al. Target recognition and segmentation in turbid water using data from non-turbid conditions: A unified approach and experimental validation[J]. Optics Express, 2024, 32(12): 20654.
Semantic segmentation of targets in underwater images within turbid water environments presents significant challenges, hindered by factors such as environmental variability, difficulties in acquiring datasets, imprecise data annotation, and the poor robustness of conventional methods. This paper addresses this issue by proposing a novel joint method using deep learning to effectively perform semantic segmentation tasks in turbid environments, with the practical case of efficiently collecting polymetallic nodules in deep-sea while minimizing damage to the seabed environment. Our approach includes a novel data expansion technique and a modified U-net based model. Drawing on the underwater image formation model, we introduce noise to clear water images to simulate images captured under varying degrees of turbidity, thus providing an alternative to the required data. Furthermore, traditional U-net-based modified models have shown limitations in enhancing performance in such tasks. Based on the primary factors underlying image degradation, we propose a new model which incorporates an improved dual-channel encoder. Our method significantly advances the fine segmentation of underwater images in turbid media, and experimental validation demonstrates its effectiveness and superiority under different turbidity conditions. The study provides new technical means for deep-sea resource development, holding broad application prospects and scientific value.
[8]
SHAO M Y, SONG W, ZHAO X B. Polymetallic nodule resource assessment of seabed photography based on denoising diffusion probabilistic models[J]. Journal of Marine Science and Engineering, 2023, 11(8): 1494.
[9]
AKKAYNAK D, TREIBITZ T. Sea-thru: A method for removing water from underwater images[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA. IEEE, 2019. DOI:10.1109/cvpr.2019.00178.
[10]
HE K M, GKIOXARI G, DOLLAR P, Mask R-CNN[C]// IEEE International Conference on Computer Vision (ICCV). Venice. IEEE, 2017. DOI:10.1109/iccv.2017.322.
[11]
QUOC T T P, LINH T T, MINH T N T. Comparing U-Net convolutional network with Mask R-CNN in agricultural area segmentation on satellite images[C]// 2020 7th NAFOSTED Conference on Information and Computer Science (NICS). Ho Chi Minh City, Vietnam. IEEE, 2020. DOI:10.1109/nics51282.2020.9335856.
[12]
ERDEM F, OCER N E, MATCI D K, et al. Apricot tree detection from UAV-images using Mask R-CNN and U-Net[J]. Photogrammetric Engineering & Remote Sensing, 2023, 89(2): 89-96.
[13]
VERELST T, TUYTELAARS T. Dynamic convolutions: Exploiting spatial sparsity for faster inference[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA. IEEE, 2020. DOI:10.1109/cvpr42600.2020.00239.
[14]
XIE J S, WU Z Z, ZHU R J, et al. Melanoma detection based on swin transformer and SimAM[C]// IEEE 5th Information Technology, Networking, Electronic and Automa-tion Control Conference (ITNEC). Xi’an, China. IEEE, 2021. DOI:10.1109/itnec52019.2021.9587071.
[15]
EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[16]
ZHOU Y T, LI W J, YANG G. Instance segmen-tation of single cells using a transformer-based semantic-aware model and space-filling augmentation[C]// IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA. IEEE, 2023. DOI:10.1109/wacv56688.2023.00589.
[17]
GILLANI I S, MUNAWAR M R, TALHA M, et al. YOLOv5, YOLO-X, YOLO-R, YOLOv7 performance comparison: A survey[C]//Artificial Intelligence and Fuzzy Logic System. Academy and Industry Research Collaboration Center (AIRCC), 2022. DOI:10.5121/csit.2022.121602.

基金

国家自然科学基金联合基金重点项目(U2244222)
国家重点研发计划(2023YFC2811305)

PDF(11174 KB)

Accesses

Citation

Detail

段落导航
相关文章

/